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THE SPECTRAL NORM OF RANDOM INNER-PRODUCT KERNEL 

MATRICES 

ZHOU FAN 1 AND ANDREA MONTAN ART 1,2 


Abstract. We study an “inner-product kernel” random matrix model, whose empirical spectral 
distribution was shown by Xiuyuan Cheng and Amit Singer to converge to a deterministic measure 
in the large n and p limit. We provide an interpretation of this limit measure as the additive free 
convolution of a semicircle law and a Marcenko-Pastur law. By comparing the tracial moments 
of this random matrix to those of a deformed GUE matrix with the same limiting spectrum, we 
establish that for odd kernel functions, the spectral norm of this matrix convergences almost surely 
to the edge of the limiting spectrum. Our study is motivated by the analysis of a covariance 
thresholding procedure for the statistical detection and estimation of sparse principal components, 
and our results characterize the limit of the largest eigenvalue of the thresholded sample covariance 
matrix in the null setting. 


1. Introduction 


Let X € R pxn be a random matrix with independent entries of mean 0 and variance 1, and let 
E = n~ 1 XX T be the sample covariance. Define a matrix K{X) e M. pxp entrywise as 


[ -7=k(^/nTini) i ^ i' 

K(X) W = W 


I = l 


( 1 ) 


where k : M —> M is a (nonlinear) “kernel” function. In this paper, we study the spectral norm 
\\K (A') || in the asymptotic regime n,p—> oo such that p/n A 7 G (0, 00 ), when k is a fixed function 
independent of n and p. 

Our study of this model is motivated by the analysis of a covariance thresholding procedure 
proposed in [36] and subsequently analyzed in [22] for the sparse PCA problem in statistics. In the 
simplest setting, this problem may be formulated as follows: 


1.1. Sparse PCA. Consider a data matrix X e M pxn with independent columns distributed as 
jV(0, E), where E is a p x p covariance matrix of the “spiked model” form 

E = Id +Xvv T (2) 

with A > 0 a constant and v EMP a vector of unit Euclidean norm. Assume further that ||u||o <Cp 
where ||u||o denotes the number of nonzero entries of v, and (for simplicity of discussion) that each 
such nonzero entry equals ±l/y / ||u||o. Based on observing X, we would like to detect the spike 
(i.e. distinguish this from the null model E = Id) and to recover the support of v [1, 5|. 

As n,p —> 00 with p/n —>76 (0,oo), in the “supercritical” regime A > A* where A* := ^/y, the 
largest eigenvalue A max (E) separates from the bulk, and the corresponding eigenvector v partially 
aligns with v. Consequently, consistent spike detection and support recovery may be performed 
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Figure 1. Largest eigenvalue of the thresholded covariance matrix M r (X) for a 
threshold function k T under £ = Id (red circles) and £ = Id+Auu^ (blue triangles), 
for n = p = 2000, A = 0.9, and ||u||o = 0.3 y/n. The asymptotic prediction ||// a ,ivy||+l 
is shown as the red curve, where a := E[£fc r (£)], u := E[fc r (£) 2 ], and 7 := p/n = 1. 

The threshold function k T here is a smoothed soft-threhold, defined by k T {x ) = 0 
for |x| < 0.8r, k T (x ) = sign(x)(|a;| — r) + for |a;| > 1 . 2 r, and quadratic interpolation 
in between^ 

using A max (£) and v [36] . However, in the “subcritical” regime A < A*, |'0 T u| -A 0 almost surely, 
Amax(E) cannot distinguish the null and spiked models, and furthermore no test using only the 
eigenvalues of £ can distinguish the models with probability approaching one [321K, 1 -FIT: 13)• 

In this regime, m proposed to exploit the sparsity of v by applying a thresholding operation x i-a 
k T (y/nx)/y/n entrywise to £ to yield a matrix M r (X), and then performing a spectral decomposition 
of M t (X). Here, r > 0 is a constant and k T : M —> M is a threshold function satisfying k T (x)/x —> 1 
as x —> ±00 and k T (x) = 0 for |x| < r, so that entries of £ of magnitude less than t / y/n. are set to 
0 while large entries are essentially preserved. (The matrix K{X) in ([!]) when k := k T is precisely 
M r {X) with diagonal set to 0.) 

The choice of threshold level t/ y/n is motivated by the following consideration: For A < A*, it is 
in fact conjectured that no polynomial-time algorithm can consistently detect the spike or recover 
the support of v if ||u||o > n 1//2+e , for any e > 0 0136] . Hence it is believed that the most difficult 
setting which permits a computationally tractable solution to these problems is when ||u||o x y/n. 
In this setting, both the non-zero off-diagonal entries of £ (the “signal”) and the fluctuations of 
the entries of £ (the “noise”) are of order 1 /y/n, so the threshold must also be of order 1 /y/n to 
preserve the signal while reducing the noise. 

In [22], it was shown that for any A > 0, spike detection and support recovery based on 
A m a x.{M t (X)) and the corresponding eigenvector can succeed with probability approaching 1 when 
Ho < Cy/n, for some constants c := c(A) > 0 and r := r(c, A) > 0. This phenomenon is illustrated 
in Figure [lj which shows that for a range of thresholds r, there is a difference between the values 
of A max (M r (X)) under the null model £ = Id and under a spiked alternative with A < A* and 
sparsity ||u||o x y/n. The main result of this paper strengthens the non-asymptotic analysis in [22] 
under the null model £ = Id to establish an exact asymptotic value for A max (M r (X)) in terms of 

^The proof of our main result requires a technical condition that k(x) is continuously differentiable. Oftentimes 
threshold functions used in practice are not smooth in this sense, but the same qualitative phenomena regarding 
detection and support recovery should hold for both smooth and non-smooth thresholds. 
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k T . Procedurally, this indicates the point above which this method should reject the null model in 
favor of a spiked alternative. We are not aware of a similar analytic characterization of the value of 
Amax(iW r (-X')) under the alternative model; such a characterization may yield insight on the exact 
critical sparsity level c*(A) and optimal choices of r and k T for spike detection using this method 
to succeed. 

In the null model of this example, since all diagonal entries of £ concentrate around 1 and 
thresholding essentially preserves the diagonal, the thresholded sample covariance M T (X) satisfies 
\\M t (X) — (Ji (X)+Id)|| — > 0, where K(X) is as in (JTj) for k := k T . Hence the largest eigenvalue limit 
of M r (X) is simply that of K(X) translated by 1. For odd and increasing threshold functions, the 
condition of Corollary |1.5| below is satisfied, so the largest eigenvalue of K{X) equals its spectral 
norm. 


1.2. Properties of the limit measure. For the model ([!]), the weak limit of the empirical spectral 
measure p _1 Y2i$\i(K(x)) °f K{X) was characterized by Cheng and Singer [20j Theorem 3.4 and 
Remark 3.2], We restate this result in the following form: 


Theorem 1.1 (Cheng, Singer). Let X E M pxn have entries Xij ~ jV(0,1). For f ~ jV(0,1), 
suppose E[A;(£)] = 0, E[fc(£) 2 ] < oo, and f k(x) 2 \q n (x) — q(x)\dx —> 0 asn->oo where q and q n are 
the density functions of the laws of f and y/nT, 12 . Then, denoting a := E[£fc(£)] and u := E[fc(£) 2 ], 
as n,p —> 00 with p/n —> 7 E (0, oo), 

1 P 

- ^2 ^ x i( K ( x )) => La,u,j 


weakly almost surely, where p a ,v,-y is a deterministic measure whose Stieltjes transform m : C + —> 
C + is the unique solution (in C + , for any z E C + ) to the equation 


1 

m(z) 


z + a 



1 + a'ym(z) 


+ 7 (u — a 2 )m{z). 


(3) 


This result was generalized by Do and Vu to the setting of non-Gaussian entries x^ in [23]. 

Before stating our main results, let us discuss some basic properties of this limit measure: For a 
linear kernel function k{x) = ax, p a ,u, 7 is a translation and rescaling of the Marcenko-Pastur law. 
Interestingly, it was observed in [20] that for kernel functions for which a = 0, pL a ,u^ is a Wigner 
semicircle law. In fact, the measure Ha,u^ in general is the additive free convolution (in the sense 
of Voiculescu 153 ) of these two laws. 


Proposition 1.2. Let p sc be the semicircle law supported on [—2,2] and let yj'yiy — a 2 )p sc denote 
the law of — a 2 )y for y ~ y sc . Let /imp ,7 be the standard Marcenko-Pastur law that is the 

limiting spectral measure of n~ 1 XX T when X E M pxn and p/n —> 7 , and let a{y mp , 7 — 1) denote 
the law of a(y — 1) for y ~ /xmp, 7 - Then 

La,u , 7 = a(MMP , 7 - 1 ) ffl \J^(y - a 2 )p sc . 

Proof. By ([ 3 ]), the measure /x ajI/i7 has ^-transform 

K(z) = -a( 1--— - )+7 {v-a 2 )z. (4) 

\ 1 — a'yz J 

It is easily verified that —a{ 1 — 1/(1 — a'yz)) is the ^.-transform of a(gMP ,7 — 1) and 7 ( 1 / — a 2 )z is 
the ^-transform of yj'y{u — a 2 )p sc , and the result follows from additivity of 7^,-transforms under 
additive free convolution [57]. □ 


Recalling that the additive free convolution of semicircle laws is itself a semicircle law, Proposition 
11.2|implies the following further decomposition of Let denote any orthonormal basis 
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of functions / : M —> M with respect to the inner product {f,g)% := E[/(£)g(£)] when £ Af(0,l), 

where ho(x) = 1 and h\{x) = x. Consider the corresponding orthogonal decomposition of the 
kernel function 

OO 

k(x) = y; adh d {x ) 

d=i 

(where ao = 0 because E[fc(£)] = 0), and the decomposition 


tf(X) = J> d (X) 

d=l 


where K c i(X) is the matrix ([Tj) with kernel function adhd(x). Letting /id denote the limiting spectral 
measure of Kd(X), which is o(// mp ,7 — 1) for d = 1 and fJ^sc for d > 2, the limiting spectral 

measure of K(X) is given by 

Ha,v, 7 = m E0 H2 ffl M3 E0 .. .. (5) 


In the proof of our main result, we will apply such a decomposition of the kernel matrix when each 
hd is the degree-d Hermite polynomial. 

implies, via the general analysis of [7], that pL a ,v^ is compactly supported, has 


Proposition 


1.2 


one interval of support when 7 < 1 and at most two intervals of support when 7 > 1 , and (except 
for the singularity at 0 in the Marcenko-Pastur case v = a 2 and 7 > 1) admits a density on all 
of M that is analytic in the interior of the support. The following may also be deduced from the 
^-transform: 


Proposition 1.3. Let supp(^ a)J , i7 ) denote the support of ji a . vrr If a > 0, then 
iriaxji : x E supp(/Lt ajJ , i7 )} > — min{x : x E supp(/u a ,„ )7 )}, 

and if a < 0, then 

max{x : x E supp(// a)i , i7 )} < — min{x : x E supp(// ajI/>7 )}. 

Proof. Replacing k(x) by —k(x), it suffices to consider a > 0. The A-transform Q admits the 
series expansion 

IZ(z) = 7 vz + ^ a z+ 1 7 l z l 
i> 2 

around z = 0, implying that the free cumulants of ia a ,u,-y are given by k\ = 0, K 2 = 717 and 
ki = a 1 r y l ~ 1 for l > 3 m • The moments of /i 0i „ )7 are then 

x l g, a ^ 1 (dx)= n^h 

J 7TSNC; SStt 

where NC; denotes the set of all non-crossing partitions of {1,...,/} [ST]. In particular, when 
a > 0 , all moments of are non-negative, whereas if maxji : x E supp(// a ., 77 )} < — min{x : 

x E supp(/r aj „ i7 )}, then the I th moment must be negative for a sufficiently large odd integer l. □ 

The support of fi a , y , 7 is easily numerically computed, as ([3]) is a cubic equation in m(z), and 
supp(// a ,^. 7 ) is the set of z E M for which this cubic equation has an imaginary root. The explicit 
form for the density function of /a a ,v,y was provided in 120 Appendix A]. 
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1 . 3 . Main results. Denoting ||/x a)lVY || = rnax{|x| : x E supp(/r a! „ i7 )}, the following is the main 
result of this paper: 


1.1 


almost surely as n,p —>• oo with 


Theorem 1.4. Suppose k : M —> M is odd (i.e. k(—x) = —k(x)) and continuously differentiable, 
with |fc / (:r)| < Ae^ x \ for some constants A, j3 > 0 and al l x £ M. Let X E MP xn have entries 

Xij I ^ > A7(0,1). Then with p a ,v, 7 as defined in Theorem 
p/n -> 7 E (0, 00), 

}\K(X)\\^\\Ha,vA 

Proposition |1.3| yields the following corollary: 


Corollary 1.5. Under the conditions of Theorem l.f, if a := E[£/c(£)] > 0, then almost surely 

A ma x (K(X)) -» max{x : x E supp(/x ai „ i7 )}. 

(It may be verified, cf. our proof of the above corollary in Section [2j that any kernel function k 
satisfying the conditions of Theorem 1.4 also satisfies the conditions of Theorem 0 ) 

We will prove Theorem 1.4 via the following two auxiliary results, the first giving a non- 
asymptotic concentration bound on || 1 F(X)|| that is of constant order when n x p, and the second 
providing an asymptotically tight bound in the case where k(x ) is a polynomial function: 


Theorem 1.6. Suppose k : M — > M is odd, continuous, and differentiable almost everywhere with 

|fe / (x)| < Ae^ x \ for some A, /3 > 0 and all x E M. Let X E M pxn have entries Xij H ^ > A7(0, 1). 
Then, for any a > 0, there exist constants C, C' > 0 depending only on A, /3, and a such that 


11-^POII > C max ( -, J- 
n \ n 


< C\p- a +pe ~ an ) 


Theorem 1.7. Let k be a polynomial function such that E[A;(£)] = 0 when £ ~ AA(0, 1), and let 
02 := — 1)]. Let X E M pxn have IID entries that are symmetric in law (x^ = —x^) 

and satisfy E[x? ] = 1 and 


E [| Xij \ k ] < k 


,ak 


( 6 ) 


for all k >2 and some a > 0. Then 


K(X) = K(X) + R(X) 


where K(X) and R(X) are such that, as n,p -> 00 with p/n —» 7 E (0, oo), 

(1) ||iP(X)|| — > || jO ai i/ : -y|| almost surely, and 

(2) R(X) = 0 if 02 = 0, and otherwise R(X) is of rank at most two, with non-zero eigenvalues 
converging to ± 027 yj(¥.[xfj\ — l)/ 2 . 

The precise form of the rank-two matrix R(X) is given by ^ in Section^ We make the trivial 
observation that 02 = 0 and R(X) = 0 if the polynomial k is an odd function. 


Theorem |1.4| follows from Theorems 1.6 and |1.7| via a polynomial approximation argument, 
which we present in Section[2j The assumption that k(x) is odd, or more specifically that 02 = 0, is 
important: Figure [ 2 ] displays the simulated spectrum of K(X) for a kernel function where 02 7 ^ 0, in 
which we see that R(X) contributes two spike eigenvalues to K{X) that fall outside of supp(// a ,ivy)- 
In the covariance thresholding application of Section 0 commonly-used threshold functions are 
indeed odd. We recommend caution if using a non-odd threshold function, as the possible presence 
of these spurious spike eigenvalues may lead to the incorrect inference that E has non-trivial spike 
eigenvectors, even in this null setting where E = Id. 
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Eigenvalue 


Figure 2. Simulated spectrum of K{X) when k{x) := h, 2 (x) + h${x) = ^(x 2 — 
1) + ^( x 3 ~ 3x), n = 1000, and p = 10000. The semicircle limit for the spectral 
distribution is superimposed in black, and the locations of two observed outlier 
eigenvalues of K(X) are indicated with red arrows. 


1.4. Further related literature. The off-diagonal entries of K(X) are the evaluations of a sym¬ 
metric kernel f(u,v ) := k(u T v / y/n) / yfn on pairs of rows of X. Such matrices for general kernels 
f(u, v ) are used in “kernel methods” in statistics and machine learning, such as SVM classifiers |10| 
and kernel PC A m- Koltchinskii and Gine [35] studied the spectra of kernel matrices in a regime 
where each row of X is sampled from a probability distribution over a fixed space (for example 
for fixed n), showing that under suitable conditions, as p —> oo, the spectrum converges to that 
of a limiting infinite-dimensional operator. El Karoui [25] studied kernel matrices in the regime 
n,p —*■ oo with p/n —> 7 G (0, 00) under the alternative scaling f(u,v) := k(u T v/n), showing that 
under mild conditions, the matrix is asymptotically equivalent to a linear combination of XX T , 
the all-l’s matrix, and the identity, and hence the limiting spectrum is Marcenko-Pastur. The scal¬ 
ing in |l| is different from the regime considered in [25] : Each off-diagonal entry of E has typical 
size 1 /y/n, and hence Q applies the nonlinearity k to values of size 0(1) rather than 0(1/y/n). 
This and more general scalings were studied probabilistically in [20], and the results were further 
generalized in [23]. Let us remark that [ 551 [25] considered distributions for the rows of X where 
the entries are not necessarily IID, but that the extension of our result to more general covariances 
E e MP xp for the application of Section 0 will require the study of a model in which the columns 
(rather than the rows) of X are independent with this covariance. 

Sparse PC A has been widely studied in statistics for both the “single spike” model ([2]) as well 
as multi-spike models. Computationally-efficient procedures for estimating sparse principal com¬ 
ponents include diagonal thresholding m\mm, l\- and model-selection-penalization approaches 
[331 E21 ED S9U601 H23 , iterative thresholding via the QR method m, approximate message passing 
[22], and covariance thresholding as discussed in Section 1.1 [551 [22]. From the theoretical perspec¬ 
tive, both exact mm and approximate [32U9i sousa nans] sparsity models have been considered, 
and a major focus has been on rate-optimal recovery of the sparse eigenvectors, their spanned sub¬ 
space, and/or the sparse covariance 01 SDj 121159], 13]. Support recovery and spike detection in 
the specific model Q were considered in [1, [5, 6] [36, [22]. In this setting, non-polynomial-time 
algorithms can detect the spike and recover the support even when v has sparsity near-linear in 
n mElE!, but it is conjectured that polynomial-time methods require the higher sparsity levels 
n. This problem is closely related to the planted clique problem in computer science m 

n corresponding to a 


TO 


< 


upon which this conjecture is based, with a vector v of sparsity 
planted clique of size k x y/n in a graph of n vertices. 


To 
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Consistency of elementwise hard-thresholding for estimating sparse covariance matrices was stud¬ 
ied in ®l23j. Optimal rates of convergence under various matrix norms and sparsity models were 
established in mm , and generalizations to other thresholding functions and to “adaptive” entry- 
specific thresholds were studied respectively in [¥6] and ED Many analyses assume that each 
row of £ contains <C y/n non-zero elements and perform thresholding at the level yj (log p)/n or 
higher, which does not apply to the regime of interest discussed in Section [l . 1 1 where v has sparsity 
IMIo x y/n and non-zero elements of size n -1 / 4 . Thresholding at more general levels, including 
1 /y/n, was studied in [22], which established a special case of Theorem 1.6 for the soft-thresholding 
kernel function k. The proof of [22j may be extended to globally Lipschitz functions k, but we 


require the application of such a bound when k is the difference of the (possibly Lipschitz) kernel 
function of interest and a polynomial approximation to this function. This difference may increase 
at any polynomial rate as \x\ —> oo, hence requiring new ideas in the proof of Theorem 1.6 to extend 
beyond Lipschitz kernels. Other spectral norm bounds for polynomial kernels were derived in [20 | 
and [M], but they do not yield the desired bound of constant order when restricted to our setting. 


In the context of random matrix theory, convergence of the extremal eigenvalues of I\(X ) was 
posed as an open question in m • For linear kernels k. K{X ) is equivalent to a translation and 
rescaling of the sample covariance £, and almost-sure convergence of the extremal eigenvalues 
follows from [2S, IBT1 2], Proposition |1.2| implies that in the general case, K{X) has the same limiting 
spectrum as a deformed Wigner matrix W + V where W is Wigner and V is deterministic with 
spectral measure converging to a(pMP/y — 1) 123 • When W is GUE and V has no spike eigenvalues, 
the results of nasi] imply that the eigenvalues of W + V stick to the limiting support, and the 
fluctuations of the eigenvalues at the edges of the support are also understood in various settings 
mmm- The proof of our main result leverages the connection between these models. 

Our proof uses the moment method and is different from the resolvent analysis of [2CjJ, although 
the decomposition of k{x) in the Hermite polynomial basis plays an important role in both analyses. 
While the resolvent method has been successful in establishing many properties of Wigner and 
covariance matrices (see e.g. [501 EDI E3 [261 05] as well as the recent work of [ 52 ] m in a non- 
independent setting), the model ([!]) for nonlinear kernels does not have the same independence 
structure as these models, and it is also not a sum of rank-one updates. These difficulties were 
overcome in (20!| via Gaussian conditioning arguments, but strengthening the bounds of [20] to yield 
finer control of the Stieltjes transform m(z) near the real axis does not seem (in our viewpoint) more 
straightforward than our moment-based approach. We believe that our combinatorial estimates and 
moment-comparison argument, in the simpler setting of a fixed moment l not varying with n, are 
sufficient to yield an alternative proof of Theorem |1.1| and also to establish asymptotic freeness of 
the matrices K\ (X). K2(X), ... leading to the decomposition ([5]). For brevity, we will not discuss 
this in the current paper. 


1.5. Notation. ||u|| = (Yli v i) 1 ^ 2 denotes the Euclidean norm for vectors. ||X|j = max|i,j| =1 ||Xi;|| 
denotes the spectral norm (i.e. ^-operator norm) for matrices. X t denotes the i th row of X. If 
X E M pxp is symmetric, A max (X) := Ai(A") > ... > X P (X) denote the ordered eigenvalues of X. 
supp(/i) denotes the support of a measure p, and ||/i|| denotes iriax{[x'| : x E supp(p)}. 

In an asymptotic setting, for positive (n, p-dependent) quantities a and b, a x b means ca < b < 
Ca for constants C, c > 0, a ~ b means a/b —> 1, a <C b means a/b -A 0, and a < b means a < Cb 
for a constant C > 0. 

We will use i , i' , 12 , ■ ■ ■ for indices in {!,... ,p}, and j, j', ji,j 2 , • • • for indices in {!,..., n}. 


2. Overview of proof 


In this section, we summarize the high-level proof ideas for Theorems 1.6 and 1.7, and we establish 
Theorem 1.4 and Corollary 1 1.5 1 using these results. 
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The proof of Theorem 1.6 uses a covering net argument: 


\\K(X)\\ < C sup y T K(X)y, 


y&Dl 


< 1}. We use a 


for a constant C > 0 and a finite covering net D 2 of the unit ball {y G M p : 
particular construction of a covering net due to Latala m- 

Definition 2.1. For m = [~log 2 p~|, let 

D p 2 = {yeR p : ||y|| < 1, yf G {0,1, 2"\ 2" 2 ,..., 2"( m+3 )} for all i } . 

For each l = 0,1,..., m + 3, let 7p : D 2 — > be defined by (vr/(y))j = yj.t{yf > 2~ ; }, and let 

ni\i -1 ■ D? 2 -f D? 2 be defined by iFi\i-i{y))i = Vit{yf = 2~ 1 }. 


as 


Corresponding to the identity y = ^l\l-i{y) f° r any V £ TA?, y T x ( x )y may be decomposed 

m +3 m+3 


y T K(x) y = My) TK i x )^i\i-i(y) + Y K ( x )^i-i(y)- 

Each of the terms 


(7) 


1=0 


i=i 


sup TTi(y) T K{X)-K l \ l _ l {y), sup ir l \ l _ 1 (y) T K(X)ni-i{y) 

y&Dl y&Dl 

may be bounded via a standard union bound, with quantities of the form F y z (X) := y T K(X)z 
controlled by bounding the gradient ||Vx-Fj /i 2 (W)|| and applying Gaussian concentration of measure 
for Lipschitz functions. The key idea of the construction of and the decomposition Q is that for 
each l , the union bound may be applied over y G n^D^), which has smaller cardinality for smaller 
l. For larger l, the entries of vr;y_i(y) are smaller, which we will show implies stronger control of 
the gradient \\XxFy,z( x )\\ over a high-probability set The moment generating function of 

Fy tZ (X) may be controlled using the integration argument of Maurey and Pisier, by extending this 
high-probability set to pairs of matrices (A', X') in such a way that we remain in this set along the 
entire integration path. The cardinality of tt^D^) balances the moment generating function bound 
thus obtained for each l , yielding Theorem |1.6| Details of this argument are given in Section [3} 
The proof of Theorem 1.7 uses the moment method and a moment comparison with a deformed 
GUE matrix. We first define the orthonormal Hermite polynomials, which play a central role in 
our proof (as well as in the proof in |20| for Theorem 1.1): 


Definition 2.2. Let {hd}^^ denote the orthonormal Hermite polynomials with respect to the inner 
product (/, g}^ = E[/(£)g(£)] when £ ~ A7(0,1), i.e. hd is of degree d and ( hd,h#)% = 1 {d = d'}. 

The first few such polynomials are given by ho(x) = 1, h\(x) = x, h 2 {x) = — 1), and 

M*) = 75 (z 3 - 3 a 0 - 


Our proof of Theorem 1.7 follows three high-level steps: 

(1) For IID random variables z±,... ,z n with E \zi\ = E[z 3 ] = 0 and E \zf\ = 1, we show that 

d 


Vd\h d 


V™ 2 T- 


n 


n 


n 


11 Z 3i- 
jl,—,3d =1 i=1 


( 8 ) 


(The summation on the right side is over all tuples of distinct indices ji ,..., j d G {1, ..., n}.) 
Each hd has leading coefficient 1 /\/d!, so Vd\h d (x) = x d + lower degree terms. Replacing 
Vd\h d { x) with x d on the left side would yield the right side of (| 8 j) without the restriction that 
the indices of summation j i,..., jd are distinct; (J 8 j) states that the terms of this summation 
in which the indices j i,... ,j d are not distinct are essentially cancelled out by the lower 
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degree terms of yfd).h d {x). We prove this approximation in Section [ 4 ] by induction on d, 
using the three-term recurrence for Hermite polynomials. The right side of ([8]) is of typical 
size 0(1), and we also quantify the error of the approximation by computing a second-order 
term, which is of typical size 0(n -1 / 2 ), and showing that the third and higher-order terms 
in this approximation are of typical size 0(n -1 ). 

(2) Since k is a polynomial such that E[/c(£)] = 0, we may write 

D 

k[x) = ^2a d h d {x) 

d =1 

where D < 00 is the degree of k. Applying the approximation in step (1) above to each h d , 
we obtain a decomposition 

K(X) = Q(X) + R(X) + S(X), 


where Q, R, and S correspond to the first-order, second-order, and third-and-higher-order 
terms of these approximations (each summed over all d = 1 ,,D). We establish for the 
first-order matrix Q{X) that 

limsup ||Q(X)|| < ||/r a ,v, 7 1| 
n,p—> 00 

almost surely, via a moment comparison argument: For an even integer l x logn, we apply 
the standard moment method bound ||Q(A')|| i < Tr Q(X ) 1 [251129] . By Q, the non-diagonal 
entries of Q(X) are given by 

D n d 

^ ( (l d Tl ! ^ ( J__j_ x ija x i'j s ‘ 

d= 1 ii,.-j'd=l s=l 

We expand the trace Tr Q(X ) 1 and interpret the terms of the resulting sum as labelings 
of a certain graph. We then consider a deformed GUE matrix M = W + V having the 
same limiting spectrum as K(X), and employ a combinatorial argument to upper-bound 
E[TrQ(X) i ] using E[Tr M 1 ]. We conclude the proof by using the known convergence result 
\\M\\ —y ||/x a i/7 || from JT6] and a concentration of measure argument to bound E[TrM*]. 
We present the main ideas of this step in Section [5] with details deferred to Appendices [A] 
and [Bj 

(3) Finally, we analyze the remainder matrices R(X) and S(X) from the decomposition in step 
(2) above. It is easily shown that ||5(A)|| -A 0. For R(X), we may write 

D 

r(x) = J2Mx), 

d =2 


where R d (X) is the contribution from the Hermite polynomial h d . (The linear polynomial 
h\ does not have such a remainder term in the decomposition.) We show ||i?d(X)|| -a 0 for 
each d > 3, and \\R 2 (X) — i?(A")|| -A 0 where 


02 


R(X) = -^-(v(X)l 1 + 1 v{xy ), 


n 


V2 


( 9 ) 


1 = (1,..., 1) € M p , and v(X) € M p has entries (v(X))i = Y^j=i( x ij ~ ^/V 71 - Noting that 
R(X) is a rank-two matrix, this yields Theorem 


1.7 


This argument and the conclusion of the proof of Theorem 1.7 


n (r 2 
j= 1 

upon setting K(X) = K(X) — R(X). 

are presented in Section [6] 


Let us now prove Theorem 1.4 and Corollary |1,5| using Theorems 1.6 and 1.7 We approximate 
the derivative of the kernel function by a polynomial using the following result: 








10 
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Theorem 2.3 (Carleson PS]). Suppose w(x) is an even, lower semi-continuous function onl with 
1 < w{x) < oo, such that logw(x) is a convex function of logx. Let C w be the class of continuous 
functions on M such that liniM -xx, f(x)/w(x) = 0 for all f E C w , and suppose C w contains all 
polynomial functions. If (log w{x))/x 2 dx = oo, then for any f E C w and e > 0, there exists a 
polynomial P such that \ f (x) — P(x)\ < ew(x) for all i£l. 


Proof of Theorem L4_. By the given conditions, there exists /? > 0 such that lini | I ,|_^ 00 \k'(x)\/e^ = 
0. Applying Theorem 2.3 with w{x) = e^ x \ for any e > 0, there exists a polynomial q such that 
| k'[x) — q(x) | < ee^ x \ for all x E M. As A: is an odd function, k! is even, so we may take q to 
be an even polynomial function. (Otherwise, take the polynomial to be \{q{x) + q(—x)).) Let 
q(x) = /J q(x)dx for all x E M, and let r(x) = k(x) —q{x). Then q is an odd polynomial function, r 
is hence also an odd function, and |r , (a;)| < ee^ x ' by construction. Let Q(X) be the kernel matrix 
|l]) with kernel function q(x), and let R(X) be the kernel matrix ([Tj) with kernel function r(x), so 
that K(X) = Q(X) + R(X). (These matrices Q(X) and R(X) are not related to the matrices Q, 
R, and S in the above proof outline.) 

Applying Theorem 1.6 with a = 2 to R(X), limsup n ||i?(A)|| < almost surely for 

some constant Cp n > 0. On the other hand, if q(x) = ao, £ + ai, £ hi(x) + ... + ao, £ hD(x) where 


hi, ..., ho are the orthonormal Hermite polynomials of Definition 2.2, then aj t£ = 0 for all even j 
(since q is an odd function), and Theorem 1.7 implies ||Q(X)|| —> ||Ma e ,i/ E , 7 ll where a £ = a\ )£ and 
u £ = Yld=i a d£- Hence, almost surely, 


II Mae,^,7II - eC/3,7 < lim inf ||AT(X)|| < lim sup ||A'(A)|| < 

n,p— KX) n,p—>oo 


*a £ ,i' £ , 7 1 


+ eC, 




for any e > 0. Note that | k(x) — q{x) \ < %e^ for all x E M, so by dominated convergence 
lim e _^o E[(/c(£) — g(£)) 2 ] = 0 for £ ~ J\f(0, 1). Then a £ —> a and u £ —> v as e —» 0, where a := E[£&:(£)] 
and v := E[fc(£) 2 ]. As ||Ma,ivy|| is continuous in a, v, and 7 , lim £ _>o IlMae^e/yll —> ll/^a., 1^,7 1|, and hence 
taking e —> 0 yields lim^^^oo ||A'(X)|| = ||/j aii , )7 || almost surely. □ 


Proof of Corollary 0 We verify the conditions of Theorem 0 The kernel function is odd and 
bounded as \k(x)\ < for a constant C := Ca ,/3 > 0, so E[A;(£)] = 0 and E[fc(£) 2 ] < 00 . Writing 

y := ySE 12 , for any R > 0 


E[k(y) 2 t{\y\ > R}} < C 2 E 


AP\v\ 


1/2 


> R} 1 ' 2 . 


Note that E[e 4/3j/ ] = E[e -4 ^] = (1 — 16 f3 2 /n)~ n / 2 for all n > 16/3 2 , so E[e 4 / 3 ^l] is bounded by a 
constant for all large n. By Lemma C.4 of [20] . P[|y| > R] 4 / 2 —> 0 as R —> 00 uniformly in n. Then 
Lemma C.5 of [20] implies that the remaining technical condition of Theorem 0 holds. Theorems 
0 and |1.4| then together imply 

max{x : x E supp(/i a ,i/, 7 )} < lim inf A max (K(X)) < lim sup A max (iL(X)) < ||/w 7 ||, 

n,p—> 00 n,p—> 00 

and the result follows as the left and right sides coincide by Proposition |1.3[ □ 


3. Proof of concentration inequality 

In this section, we prove Theorem |1.6| following the outline sketched in Section [2] By rescaling 
k(x), we may assume without loss of generality A = 1. We denote by Xj the i th row of X. 

Lemma 3.1. For any a, (3 > 0, there exist constants C,C > 0 depending only on a and (3 such 
that the following holds: Define Q(a, /3) C M pxn x W xn as the set of pairs ( X , X') of p x n matrices 
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such that ||X|| < yfp + (1 + \f2a)yfn, |||| < yfp + (1 + \[2a)\fn, and for each l = 1 ,... ,p 
” — ' ' p y ,t 


\Y.^(^\xIxa ) <c, i^cxp^lAf x;i ) <c, 

i =1 v v ' n — 1 v v 

i^l 


i=l 

i^l 


l 




V 

^ i= 1 


\ v™ 


p 


exp 


i= 1 

i^l 


16/3 lvTv , 


n 


XfX I < c. 


If X,X' G MP xn are random and independent with X{j , x’- II r^ > Af(0.1 ), then 

P[(X,X') 0 Q(a,P)\ < C’ (p- a +pe~ an ) . 

Proof. By Corollary 5.35 of [56], P [||X|| > yfp + (1 + \/2a)y/n] < 2e~ an , and similarly for X'. 

For £ ~ jV(0,1) and any u > 0, E[e“l^] < E[e“^] +E[e _n ^] = 2e~ and Var[e u l^] < E[e 2 “^l] < 
2e 2u . Let C(a,u) and c(a,u) denote large and small constants that may change from instance to 
instance. Defining /(£) = e u ^ — E[e“^l], E[|/(£)| a+2 [ < C(a,u). Then for £i,...,£ p I ~ > AT(0,1), 
applying Corollary 4 of m with t = a + 2, 


j>^l>3e^r 


i=! 


< 


5^/(6) P pe 1 


_Z=1 


< C(a, u)p 


—a— 1 


For any i / l, {XjX/.Xi) = (||JQ||£j, X{) where fi ~ jV(0,1) is independent of Xp Hence 


P 


i =1 


( 16^| 

V v™ 


- ^ exp [ ±U — | XjX,\ ) > 3e 


128/3 3 |[XJ 2 


X, 


<ci a ,apV-«- 


n J 


p 


and 


“5Z exp (-7=\ X i X l\) >C(a,P) 

||^C|| 2 — (1 ~b 2a; + 2\foi)n 

i^=l 



—a—1 


<C\a,P)p 


for some constants C(a,/3) and C'(a,(3). Lemma 1 of [38] implies the chi-squared tail bound 
P[||X ;|| 2 > (1 + 2a + 2y/a)n] < e~ an . The same argument holds for the analogous sums with 
X'- T Xi , Xf XL and X'- 1 X', in place of XjX[ , and the result follows by a union bound over l. □ 


Lemma 3.2. Let y,z G M p satisfy \\y\\ < 1 and ||z|| < 1 . Under the setup of Theorem 1.6. let 
F(X) = z T K\ p (X)y and define Q(a,/3) as in Lemma 3.1. Then for a constant C := C(a,f3 ) > 0 
and any t > 0 , 


E 


e t{F(X)-F(X')) 1{{XjXl) € g {a ^ )} 


< 2 exp 


C\\y\\ 00 t 2 p 1/2 (n + p) 


rr 


Proof. Consider F as a function from MP n to M. The gradient with respect to column l of X is 
Vx,F(X) = V_v, 


't±Ls k 


j=l i '=1 

\ i'¥=i 


n \ y/n 


/ 
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E~ k ' 

Z—/ ri 


-k' f X/ / Xl \ ( ZiVl + yizfiXj = —vJX + —VyX, 
n \ \ n ) n n y 


for v y ,v z € with (v y )i = k'(X'[Xi/ y /n)y i t{i / 1} and (v z )i = k'{Xf X x /y/n)zil{i / l}. This 
yields the gradient bound 


P %/ 2 


l|VF(X)|| 2 = £ IIVx,F(X)|| 2 < £ ®-||V|| 2 |M 2 + 2 4\\Xf\\v,f 

1 = 1 1=1 n U 

4||X|| 2 " " fXfXA 2 , - ^ 4||A|| 2 » A L 


W 

r,2 


EE*' 

i =1 i=l 


2 2 ^ 4||A”|| p 
z i Vi < ma 


ix k 


,( XjX l 
V x/h 


where the last inequality applies v T Mw < ||'c||i||M'u;|| 00 . Applying Cauchy-Schwarz and the bound 


14 E ll2/||2||2/||oo E 112/Hoo, 


l|VF(X)|| 2 < 

n z 7=1 


ft *'® 4 


/ dllXflblU p ® 

< - t, -max > k 

n 2 *=i 


T V .\ 4 


; ^ 
V 


p 

Y.y? 

1=1 

1/2 


We apply the integration argument of Maurey and Pisier: For each 9 € [0, |], let Xq = X' cos 6+ 
X sin 9 and Xq = —X' sin 9 + X cos 9. Then 

E e WO-n*'))i {(X,X') eG(a,P)} 

= E [exp ( 2 [ 2 7 ^^F(X e )de] 1{(X, X') G G(a,(3)} 


[7T J 

„ 7T 

2 f2 


2 r~2 

irt 

d 

* Jo 

y 

d9 

( 

'irt 

d 

exp 1 

J 

d9 

r / 

Tit, 



F(X 0 )d9j l{(X,X')€G(a,P)} 
F(X e )) d61{(X,X')£G(a,/3)} 


where V F(Xq) t Xq represents the vector inner-product in M pn . Noting that Xq and Xq are inde¬ 
pendent and both equal in law to X, we may first condition on Xq and use the Cauchy-Schwarz 

c 2 

inequality and the bound E[e c K^ol] < E[ e c(W)o] -fE[e" c ( x »)ii] < 2e^ to obtain 


E {{X,X’) eG(a,/3)} 


<- [*E E exp f7rtVF(X fl ) T X 0 ) Xq E 1{(X, X') £ G(a, (3)} Xq d9 


7r 2 f 2 ||VF(A e )||^ 


4 /■* 

< — / E exp 

* Jo 


E 1 {{X,X') eG(a,P)} X e ~ d9 
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4 P 
< - E 
7T JO 


exp 


7 T 2 t 2 \\VF(X e )W- 


1 {(X,X')eg(a,P)} 


do. 


The definition of G(a,{3) implies that ||Vi ? (X 5»)|| 2 is controlled over the entire integration path 
9e [0, f]: We have 

\\X g \\ 2 < 2||X / || 2 (cos 0) 2 + 2||A|| 2 (sin0) 2 < 2 max(|| A|| 2 , ||X'|| 2 ), 


and also 
p 


i=i 




1=1 


/4/3|(X 0 )f(X 0 ) z 




n 


< Vexp f mXjX,\ + \xf x,\ + \XjX[\ + \xfx[\) 

1=1 V 


n 



f p 

\ 

1/4 

f p 

\ 

< 

E«H 

ae/3|xf^|\ 

< yfn J 


E“p( 

^16/3|X' T A Z |\ 
\/n / 


Z=i 

\ 



1=1 
\ i¥* 

V 


1/4 


\ 1/4 / 

V- {16P\xTx'" P 

2^exp - —X 

U V Vn 


1/4 


W 


/ 


J^exp 


1=1 


W : 


(mx?x[\ 
l Vn 


by Holder’s inequality. Then for any (X, X') G G(a,/5) and 9 G [0, §], (10) implies 

0 p l/2 {n + p) 


n“ 


2.1 


||VF(X „)|| 2 < ^ 

and the result follows. 

Let us now recall D 2 , Tii\i-i, and 7 q from Definition 

Lemma 3.3. For any symmetric matrix M G M pxp , \\M\\ < 10sup yg £,p y T My. 

Proof. For any x G M p with ||x|| < 1, we may construct y G D 2 such that 

f 2 “5 sign(xj) 2~ l < xf < 2~ l+l 
Vl ~ jo x 2 < 2 -m-3 . 

Then ||y|| < ||x|| < 1 and, letting c = (1 — 1 /\/2) 2 , 


□ 


x - y = 


Y ( x * - Vi) 2 + Y - Y CX i + ^ 

:cc?<2 —m— ^ i\x?'>2~ m— 3 i:x?‘< 2 ~ m_ 3 

1 — C 
8 ~ 


2:rr?>2 m ^ 


< c + (1 -c) 2 X 2 <c+— <(9/20) 2 . 

i:a:?< 2~ m ~ 3 


The result then follows from Lemma 5.4 of [S3- 

Lemma 3.4. Let m = [~log 2 p~|. For some C > 0 and all l G {0,1,... ,m + 3}, 

log |{vr/(y) : y G Df}| < C(m + 4 - l)2 l . 


□ 
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Proof. Let C > 0 denote a constant that may change from instance to instance. For any l E 

{0,1,... ,m}, 

2 l 

' P \ nk 


\{ni\i-i(y) ■ y e d 2}\ < X] Q 2 


as there are at most 2 l non-zero entries of vr;y_i(y), and for each non-zero entry there are two 
choices of sign. Using and noting that k i —> (2 ep) k k~ k is monotonically increasing over 

k E [0, 2 p] and that 2 l < 2p for l < m, this implies 

log \{*i\l-i(y) : y e D l}\ < log < log ( 2 e 2 m ^) ^ < C(m + 1 - l)2 l . 

For l E {m + 1, m + 2, m + 3}, we use the bound |{7qy_i(y) : y E D%}\ < 3 P , as each coordinate of 
TTl\l-i(y) takes one of three values. Then 

log \{*i\l-i(y) -y£D p 2 }\< C2 m <C{m + 4 - 02 z . 

Combining these bounds, 

i i 

log|7T*(y)| < X>gh v _ l( y)| < C^(m + 4-j)2 J < C(m + 4 - l)2 l . 


3=0 


3=0 


□ 


Lemma 3.5. Under the setup of Theorem 1.6, letm= [log 2 p] and let G(a, (3) be as in Lemma \3U\ 
Then there are constants C, c > 0 depending only on a and [3 such that for any l E {0,1,..., m + 3} ; 
j = l or j = l — 1 (if l > 1 ), and t > 0, 


sup 7r j (y) T K(X)-Ki\i_ 1 (y) > t and (A, A') E G(a,f3) 

y&Dl 


< 2 exp C(m + 4 — l)2 l — 


ct 2 2 l / 2 n 2 


p l / 2 {n + p) I 

Proof. For notational convenience, define the event £ := {(X, X') E Q(a,/3)}. Applying Lemma 


3.4 and a union bound over {717(x) : x E D 2 }, for any A > 0, 


sup Tr j (y)K(X)n l \ l _ 1 (y) > t and £ 
y& D l 

< e C(m+4 -' )2 ‘ sup P [7r, MKWir^iy) > t and S] 

y£{iri(x):x£D%} 

< e C{m+i-l)2 l e -\t gup E r e A 7 r i (y)X(A')^ v _ 1 (y) 1 | < r| 

ye{ni(x):x£D%} 

Let A be the set of all diagonal matrices in M pxp with all diagonal entries in {—1,1}. Note that 
(A, X') E G{ct, (3) if and only if (A, DX') E G{a, (3) for all D E A. Then, conditional on A and the 
event £ . X' equals DX' in law for D uniformly distributed over A. Hence 

E[A(A')|A,£] = E[K(DX')\X,£] = E[E[K(DX')\X', X,£\\X,£] = 0, 

where the last equality follows from E[K(DX')\X'\ = 0 as the kernel function k is odd. Then 
Jensen’s inequality yields, for any y E D 2 and A > 0, 

E e -*K j (y)K(X')TT l \ l _ 1 (y) £ 


> 1, 


and so 
E 


e \ T 7 j ( y ) K (A)t riy.! (3/) j (£ j 


= E 


X^j(y) K ( x )^i\i-i(.y) 


£ 


¥[£} 
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< E 
= E 
= E 


e \n j (y)K(X)iT l \ l _ 1 (y)-^ 




X,£ 


£ 


F[£} 


£ 


< 2 exp 


e \^(y)(K(X)-K(X')) 7 T lv _ 1 (y) 
e Aw J -(y)(A-(X)-A-(A-'))ir I \,_ 1 (») 1 r f T. 

C\ 2 p 1 / 2 (n + p) \ 


F[£\ 


2 l / 2 ri 2 


where the last line applies Lemma 3.2 and the bound ||vr;y_ 1 (y)|| 00 < 2 ^ 2 . Optimizing over A 
yields the desired result. □ 


We now conclude the proof of Theorem 1.6 


Proof of Theorem 1.6. For each l = 0,..., m + 3, set 


= 


Co(m + 4 — l)2 l ' /2 p 1 / 2 (n + p) 


n* 


for a constant Co ■= Co(a,/3). Let X' be an independent copy of X. Then by Lemma 3.5, for each 
l = 0,..., m + 3 and j = l or j = 1 — 1, 


sup TT j (y) T K(X)7Ti\ l _ 1 (y ) > t t and (X,X f ) £ G(a,/3) 

y&Dl 


< 2 e -( C '- cC 'o)(m+4-02 ! 


Recalling m = \ log 2 p], we may pick Co sufficiently large such that 

m+ 3 

Y 4e" (c, " cC, ° )(m+4_z)2! < 4(m + 4) e -(C-cC 0 )(m+4) < c > p - 


1=0 


for a constant C' := C'(a,(3). Then (Jr]) and a union bound imply 


m +3 


sup y r K(X)y >2 Vfj and (X,X r ) £ 

y&Dl 


1=0 


< c'p 




Finally, the bound 

»E*» < 2 qiV/4 (" +p)Va E (l " +4 -o 2 ) 

i=0 n *=0 


zcyy/^n+p)'/^ 3 ■ „ 4 . 

«=0 j=0 


Cp 1 / 4 (n + p) 1 / 2 ^ 1 / 4 


^ n i P P 
< C max —,\ — 
n \ n 


□ 


n ii ' n 

i=o j =o 

the decomposition ([7]), and Lemmas 3.1 and 3.3 yield the desired result. 

4. Decomposition of Hermite polynomials of sums of IID random variables 

In this section, we prove the approximation Q formalized as the following proposition: 

Proposition 4.1. Let Z = (zj : 1 < j < n) £ M n , where Zj are IID random variables such that 
E [ zj \ = E[^|] = 0, E[zj] = 1, and E[|zj| z ] < oo for each l > 1. Let hd denote the orthonormal 
Hermite polynomial of degree d. Define 


qd ,n{ Z) ~ J n d d] 


( 11 ) 


Jlj—Jd=l i=1 
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Td,n{ Z ) = ' 


®d,n(-^) — hd 


i /d 

n d d\ \2 


£ 

j lv.id-l —1 


( 4 - 


d= 1 


d -1 


i=2 


(sj-) 


Qd,n( Z ) f’d,n{.Z')- 


(12) 


(13) 


Then, /or each d > 1 and any a,/3> 0, P[|s^ n (Z)| > n 1+ “] < n 13 for all sufficiently large n (i.e. 
for n> N where N may depend on a, f3, d, and the distribution of Zj). 

The following lemma shows that P [\qd t n(Z)\ > n a ] < n ~' 3 and P \rd, n (Z)\ > n~ 2 + " < n for 
any a, (3 > 0 and all sufficiently large n. Hence Proposition |4.1| may be interpreted as decomposing 
hd{n~ 1 / 2 J2j= l z j) i n t° the sum of an 0(1) term q^ n (Z), an 0(n -1//2 ) term rd : n{Z), and an 0(n -1 ) 
term s d , n {Z). 

Lemma 4.2. Suppose z\,...,z n are IID random variables, with Eflzjl*] < oo for all l > 1. Let 
pi,... ,pd : M ► M be any polynomial functions such that E \pi(zj)\ = 0 for each i = 1,..., d. Then 
for any a, /3 > 0, 


_ d 

n 2 


E I[Pi( Z 3i) 


ii,-,h=l *=i 

h^hA—Aid 


> n 


< n 


-P 


for all sufficiently large n. 
Proof. Fix a,/3 > 0. Let 


f(zi,...,Zn) = n 2 


E 11/( 

jlr--Jd = l *=1 

ji^hA—Aid 


and let l be an even integer such that al > /3. Then 

P[/(*!, • • • , Zn) > n a ] < nf(zi,.".,Zn) 1 } ' 

n 

and it suffices to show E[/(zi,..., z n ) 1 ] < C for a constant C independent of n. Note that 


nf{z 1 ,...,z n ) i ] = n -T y ■■■ E E 

ih-dd = 1 L»=i k=i 

i\¥=-¥=id 


d i 

nn> [ z a 


For each term of the above sum, if there is some j such that j = j!f for exactly one pair of indices 
i 6 {1,..., d} and k E {1,then the expectation of that term is 0 as E \pi(zj)\ = 0 and Zj is 
independent of z \,..., Zj- 1 , z,j + \,..., z n . Hence, for terms in the sum with non-zero expectation, 
there are at most l -j distinct values of jf. Then the number of such terms is at most Cn ~2 , and the 
magnitude of each such term is at most C', for some constants C, C independent of n, establishing 
E [f( Zl ,...,z n ) l ]<C. □ 
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Proof of Proposition f. 1 Let S = ^” =1 Zj . It will be notationally convenient to work with the 

rnonic Hermite polynomials hd = \fd\hd- Let us accordingly define qd, n = Qd,nVdi, fd, n = rd,nVt il, 
and Sd, n = Sd, n Vd\. Then 

hd(S ) = q d ,n(Z) + F d,n( Z) + Sd,n(Z), 

and we wish to show for any a, (3 > 0, P [|sd(-^)| > n _1+Q ] < n~^ for all sufficiently large n. 

We proceed by induction on d. Note that ho(x) = 1, h\(x) = x, and h, 2 (x) = x 2 — 1. Then for 
d = 1, hi(S) = S = qi 7n (Z), and for d = 2, 


/ 


h 2 (S) = S 2 - 1 = n 


-l 


\ 


E 

31,32=1 

■ 3l#32 


+ E 1 

3=1 




= <?2,n(Z) + f 2 ,„(Z). 


/ 


Hence the proposition holds with si tTl (Z) = S 2 ,n{Z) = 0. 

Let us assume by induction that the proposition holds for d — 1 and d. Recall that the rnonic 
Hermite polynomials satisfy the three-term recurrence hd+i(x) = xhd{x) — dhd~i(x) (c.f. eq. (5.5.8) 
of |53|). We may compute 

n n d 


Sq d , n (Z) = n 2 ^2 zj E I I 


'3i 


_ d+l 

= n 2 


3=1 3l>->3d=l*=l 

31 #— 7 6 3d 

( \ 

n d -\-1 n d 

e n z n +<j e 4n 

3l,-,3d=l i =2 

31#—#3d 


°3i 


31.->3d+l=l *=1 
\3l#—#3d+l 


2 d(n — d + 1) _ 

= 9d+l,rj(^) T ^ ^ Fd+l,n(-Zj H ~ Qd—l,n(Z), 


d+l /d 


d—1 


Sr d ,n( z ) = n 2 L E E (4 - !) II** 


3=1 31 ,—.3d—1=1 

31#—#3d-l 


i=2 


d+l 

= n 2 


E 


(4 - 


3l> — :3d=l 
\31#—#3d 


i=2 


vn *) + e (( 

3li—j3d-l=l 
3l#-#3d-l 

\ 


z 3i - 


d—1 

>n 

i=2 


d—1 


+( d - 2 ) E (4 - *)4 n 


Jl> —)3d—1=1 
31#—#3d-l 

d - 1 _ , _d+i / d 

j^r d+ i, n (Z) + n 2 


i=3 


E < 


*3l “^1. 


31.->3d-l=l 
31#—#3d-l 


d—1 

n 

i=2 


'3i 


d+l 

+ n 2 


d—1 


f) (d- 2 ) £ (4 - 1)(4 - !) n % + — „ d + 2) r J - 1 ,„(2). 


31.---.3d—1=1 
3l#---#3d—1 


i=3 


n 
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Substituting these expressions into the three-term recurrence, 

hd+l(S) = S (q d , n (Z) + f d n {Z) + S dt n(Z)) — d ( Qd-l,n{Z ) + f d -l t n(Z) + S d -l,n(Z)) 
= Qd+l,n(Z) + r d+1 ,n(Z) + §d+i t n(Z) 
for 

n / d —1 \ 


did — 1) _ d+i 

s d+l,n\Z) '■= -—- Qd-l,n{Z) +Tl 2 


n 


E 


(4 - *ji) n 

3=2 


'3i 


+ n 2 


d+1 / d 


d—1 


( d - 2 ) 2 (4- i >(4- i )n 


d(d - 2) 




i =3 


n 


F<3—l,n(-^) 


+ Ss d , n {Z) - ds d - 1>n (Z) =:I + II + III + IV + V + VI. 

Fix a, (5 > 0. Note that E[zj] = 0, E[z? — 1] = 0, and E[z? — Zj] = 0, so by Lemma 


4.2 


max (P 


\I\ > n 


-i+f 


|//|>n" 1+ f ,P \III\ > n -i+ 2 ,P |/V| > n _i+ 2 


-i+f 


,-i+S 


< n 


-2/3 


for all large n. By the induction hypothesis, P[|iDn[ > n _1+ 4] < /2 for all large n, and also 

IP[| >S| > 77 , 4 ] < n~ 2 ^/2 for all large n by Lemma 4.2 (applied to the simple case where d = 1 and 
Pi(x) = x). Then P[|P| > n _1+ 2 ] < n _2/3 for all large n. Similarly, the induction hypothesis 
implies P[|VT| > n~ l+ ^} < n _2f3 for all large n. Putting this together, 


P [\s d+ i, n (Z)\ > n 


-l+al 


< 


\I + II + III + IV + V + VI I > 6n" 1+ t 


< 6n 2/3 < n 13 


for all large n, completing the induction. 

5. Bounding the dominant matrix 


□ 


Consider the polynomial kernel matrix K(X) in Theorem 1.7 Throughout this section, we let 
D < oo denote the (fixed) degree of the polynomial k, and we write 

D 

H x ) = '^2 a d h d {x). 

<3=1 


Corresponding to the decomposition of h d given in Proposition 4.1, we consider the following 
decomposition of K(X): 

Definition 5.1. Define Q(X) = (q ll / : 1 < i, %' < p) G M pxp with entries 


Qii' — * 


D 


/— ^ ' Q’dQ.d,n{^XilXi'\i • ■ ■ , Xj n Xj< n ), % % 

V <3=1 


v0, 


i = i 


m 


where q d , n is as in (11). Define R(X) G W xp and S(X) G R pxp analogously with r d . n and s d , n i 
place of q dt n, where r d ,n and s d , n are as in (12) and (13). 

With the above definitions, K{X) = Q(X) + R(X) + S(X). In this section, we establish the 
following result: 


Proposition 5.2. Under the conditions of Theorem 0 letting Q(X) be as in Definition \ 5.1 
limsupyjp^oo ]|Q(X)|| < WnanJ almost surely. 
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Figure 3. A multi-labeling of an /-graph for / = 4 and D = 3. p-vertices are 
depicted with a circle and n-vertices are depicted with a square. 


Our proof uses the moment method and the moment comparison argument described in Section 
[2] The following definitions of an /-graph and a multi-labeling of such a graph will correspond to 
the primary combinatorial object of interest in the subsequent analysis. 

Definition 5.3. For any integer / > 2, an /-graph is a graph consisting of a single cycle with 2/ 
vertices and 2/ edges, with the vertices alternatingly denoted as p-vertices and n-vertices. 


We will consider the vertices of the /-graph to be ordered by picking an arbitrary p-vertex as the 
first vertex and ordering the remaining vertices according to a traversal along the cycle. A vertex 
V “follows” or “precedes” another vertex W if V comes before or after W, respectively, in this 
ordering, and the last vertex of the cycle (which is an n-vertex) is followed by the first p-vertex. 

Definition 5.4. A multi-labeling of an /-graph is an assignment of a p-label in {1,2,3,...} 
to each p-vertex and an ordered tuple of n-labels in {1, 2,3,...} to each n-vertex, such that the 
following conditions are satisfied: 

(1) The p-label of each p-vertex is distinct from those of the two p-vertices immediately pre¬ 
ceding and following it in the cycle. 

(2) The number d s of n-labels in the tuple for each s th n-vertex satisfies 1 < d s < D, and these 
d 8 n-labels are distinct. 

(3) For each distinct p-label i and distinct n-label j, there are an even number of edges in the 
cycle (possibly 0) such that its p-vertex endpoint is labeled i and its n-vertex endpoint has 
label j in its tuple. 

A (p, n)-multi-labeling is a multi-labeling with all p-labels in {1,..., p} and all n-labels in 
{1,... ,n}. 

A key bound on the number of possible distinct p-labels and n-labels that appear in a multi¬ 
labeling of an /-graph is provided by the following lemma. We will always consider p-labels to 
be distinct from n-labels, even though (for notational convenience) we use the same label set 
{1, 2,3,...} for both. 


Lemma 5.5. Suppose a multi-labeling of an l-graph has d\,...,di n-labels on the first through 
I th n-vertices, respectively, and suppose that it has m total distinct p-labels and n-labels. Then 
m < — -1" 1- 

We defer the proof of Lemma 5.5 to Appendix [Aj Figure [3] shows an example of a multi-labeling 
of an /-graph for / = 4 and D = 3. In this multi-labeling, Y] x c/ s = 3 + 3 + l-|-l = 8 and the 
number of total distinct labels is m = 3 + 4 = 7, so Lemma |5.5| holds with equality. 


The non-negative quantity 
we give it a name: 


*+E‘ 


+ 1 — m appears in many of our combinatorial lemmas, and 
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Definition 5.6. Suppose a multi-labeling of an Z-graph has d\,... ,di n-labels on the first through 
I th n-vertices, respectively, and suppose that it has m total distinct p-labels and n-labels. The 

excess of the multi-labeling is A := <+ ^ 2 =1 ^ + 1 — m. 

A high-level intuition, which we make precise in various ways in Appendix [A] is that multi¬ 
labelings with zero or small excess satisfy many regularity properties. For example, we prove the 
following in Appendix [A) 


Lemma 5.7. Suppose a multi-labeling of an l-graph has excess A. For each i £ {1,2,3, ...} and 
j £ {1,2,3,...}, let bij be the number of edges in the l-graph such that the p-vertex endpoint is 
labeled i and the n-vertex endpoint has label j in its tuple. Then JT yh , :] >2 bij A 12A. 


In particular, a multi-labeling with excess A = 0 has either b 


( i,j ), by the above lemma and condition (3) of Definition 5.4 
of Figure [3| 


IP 


= 2 or bij = 0 for every label-pair 
This indeed holds for the example 


Definition 5.8. Two nmlti-labelings of an Z-graph are equivalent if there is a permutation n p of 
{1, 2,3,...} and a permutation 7r n of {1, 2, 3,...} such that one labeling is the image of the other 
upon applying tt p to all of its p-labels and n n to all of its n-labels. For any fixed Z, the equivalence 
classes under this relation will be called multi-labeling equivalence classes. 


The number of distinct p-labels, number of distinct n-labels, number of n-labels d\,... ,di on 
each of the Z n-vertices, and excess A are equivalence class properties, i.e. they are the same for all 
labelings in the same multi-labeling equivalence class. The connection between Definition 5.4 of a 
multi-labeling and our matrix of interest Q(X) is provided by the following lemma: 


Lemma 5.9. Let Q(X) be as in Proposition \5.i\ and let l > 2 be an even integer. Let C denote 
the set of all multi-labeling equivalence classes for an l-graph. For each multi-labeling equivalence 
class L £ C, let A(£) be the excess, r(£) the number of distinct p-labels, and d\(C),... ,di(£) the 
number of n-labels on the first to I th n-vertices, respectively. Then, for a > 0 as in |tj|) and with 
the convention 0° = 1, 


K[TrQ(X) 1 } <n^_] 
CeC 


(12A(£)) 12q \ A(£) cpy(C) / ' \a ds{c) \ \ 
n J \n) (dsiCy.) 1 / 2 ) ' 


(14) 


Proof. By Definition |5.1| letting ii + 1 := i\ for notational convenience, 


E[TVQ(A) ; ] = 


E 


E 


* 1 ^* 2 , * 2 ^* 3 , ■■■jl^il 


n 

U=l 


T, 


n+i 


E 


n 2 E 


ii,...,Jz = l 
* 17 ^* 2 , *2^*3, 


n 

S=1 


D I - n d 

E e n 


d= 1 


X isja X is+lja 


= l 0=1 
jl¥=j2j k —¥ : jd 


E 


D 


E E - E 




l 


*1^*2, *2^*3, 


n 


i+ £l=i ( -p-r ad 
2 


l d s 


n 


11 W) 1/2 , 


E 


nn 

_s =1 a= 1 
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Note that as x %3 — —x^ by assumption, E[x^-] = 0 for any positive odd integer c. Hence, if 
any x^j appears an odd number of times in the expression nl=] W'a=\ x '*.o;I x 'A+Ua’ then as the 
entries of X are independent, E 


n,s=i Ela=i x isJa X i s +]j s = 0 - We identify the combination of 
sums above, over the remaining non-zero terms, as the sum over all possible ( p , n)-multi-labelings 
of an /-graph. Here, the first sum over i\,... ,ii is over all choices of p-labels, with condition (1) 
in Definition 5.4 corresponding to the constraints i\ ^ i 2 ,i 2 7 ^ Z 3 ,..., 7 ^ i\ in the sum. The 

sum over d\,... ,di is over all choices of the number of n-labels in the tuple for each n- vertex, and 
the sum over jf ,..., j J is over all choices of d s n-labels for the s th n- vertex, with condition ( 2 ) in 
Definition 5.4 corresponding to the constraint that jf,... ,j^ are distinct. The product expression 

EU ria=l X isjS X i s +irZ then corresponds to a product, over all n-vertices, all d s n-labels for that n- 
vertex, and both p- vertices immediately preceding and immediately following that n- vertex, of Xij , 
where j E {1,..., n} is the n-label and i E {1,... ,p} is the p-label of the p- vertex. The condition 
that each X{ 3 appears an even number of times so that this term has non-zero expectation is precisely 
condition (3) in Definition 5.4 Thus, to summarize, 

E[TYQ(X)*] = Y, n 

/-graph (p,n)-multi-labelings 


Z +X) s =1 d s 


n 




E 


nn 




.s=l a= 1 


H w ) 1 ' 2 

where d\, ... ,di are the numbers of n-labels for the first through / th n-vertices, respectively. 

Consider a fixed ( p, n)-multi-labeling and write ni=i ria=i x isj% x i s +ijS. = YYj=i HILi ,J ^ 3 1 where 
bij is the number of times x^ appears as a term in this product. Note that each bij is even (possibly 
0). As E[x'^-] = 1, E[|xjj| fc ] < k ak , and the entries of X are independent, 


E 


/ 

n n Xi »3s. x is+ijs. 

= n e 

T bi i 

X ij 

< n c s 

E U 

_s= 1 a= 1 



^ 1 j • bij ^ 

\i,j:bij> 2 J 


«E 


i,r-bij>2 


< (12A) 


12aA 


where the last inequality applies Lemma 5.7 and we use the convention 0° = 1. (14) then follows 
upon noting that each ( p , n)-multi-labeling with r distinct p-labels and m — r distinct n-labels 

has ( p ^ ! . r )! ( w _^ ! +r )i A n m (nY (p> ^)- m ulti-labelings in its equivalence class, and n 


*+E s =i ds 


n 


1—A 


+m _ 

□ 


We wish to compare the upper bound in (14) to an analogous quantity for a deformed GUE 
matrix: 


Definition 5.10. For h,p > 1 , let W = (wui : 1 < i, i' < p) E C pxp be distributed according to 
the GUE, i.e. {wu : 1 < i < p) U {y/2Rewai, y/2lmwar : 1 < i < i' < p} are IID Af(0, 1), and 
wn' = Wi'i for i > i'. Let V E MP xp be standard real Wishart-distributed with h degrees of freedom 
and zero diagonal, i.e. V = ZZ T — diag(||Zj|||) where Z = ( Zij : 1 < i < p, 1 < j < fi) E R pxn , 

Zij H ^ > A7(0,1), and ZZ T — diag(1111§) denotes ZZ T with its diagonal set to 0. Take V and W to 
be independent, and define 

M = + -V E C pxp . 

y p n 

As n,p —> 00 with p/h -A 7, the limiting spectral distribution of M is also It follows from 

the results of [16] that, in fact, a norm convergence result holds for M, i.e. lim^^^oo l|M|| = \\n a ,^\\, 
using which we may establish the following Proposition: 
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Figure 4. A simple labeling of an /-graph for l = 4. p- vertices are depicted with a 
circle and n-vertices are depicted with a square. 


Proposition 5.11. Let M be as in Definition 5.1C\ Suppose l is an even 
with p/h -A 7 and l < Clogh for some constant C > 0. Then, for any e 
large h, 

E[||M||']<(|K,„, 7 ||+e)‘. 


integer and h,p, l —> oo 
> 0 and all sufficiently 


The proof of Proposition 5.11 


is deferred to Appendix [b| As p 1 E[T\-M i ] < E[||M||*], our strategy 


for proving Proposition |5.2| will be to show that the upper bound in (14) can in turn be bounded 
above using the quantity E[Tr M l ], for some choices of p and h. To analyze E[TY M l ], we consider 
the following notion of a simple-labeling of an /-graph: 


Definition 5.12. A simple-labeling of an /-graph is an assignment of a p-label in {1, 2, 3,...} 
to each p -vertex and either one n-label in {1, 2,3,...} or the empty label 0 to each n-vertex, such 
that the following conditions are satisfied: 

(1) The p-label of each p -vertex is distinct from those of the two p -vertices immediately pre¬ 
ceding and following it in the cycle. 

(2) For each distinct p-label i and distinct non-empty n-label j, there are an even number of 
edges in the cycle (possibly 0) such that its p -vertex endpoint is labeled i and its n-vertex 
endpoint is labeled j. 

(3) For any two distinct p-labels i and i', the number of occurrences (possibly 0) of the three 
consecutive labels i, 0, i' on a p- vertex, its following n-vertex, and its following p -vertex is 
equal to the number of occurrences of the three consecutive labels i!, 0, i. 

A (p, n)-simple-labeling is a simple-labeling with all p-labels in {1 ,...,p} and all non-empty 
n-labels in {!,..., n}. 


Analogous to Lemma 5.5, the following lemma provides a key bound on the number of possible 
distinct p-labels and n-labels that appear in a simple-labeling of an /-graph. 


Lemma 5.13. Suppose a simple-labeling of an l-graph has k n-vertices with non-empty label and 
rh total distinct p-labels and distinct non-empty n-labels. Then rh < + 1. 


The proof of Lemma 5.13 is deferred to Appendix [Aj We may then define the excess of a simple¬ 


labeling, analogous to Definition 5.6, and note that the excess is always nonnegative. 


Definition 5.14. Suppose a simple-labeling of an /-graph has k n-vertices with non-empty label 
and rh total distinct p-labels and distinct non-empty n-labels. The excess of the simple-labeling 
is A := ^ + 1 - rh. 

Figure [ 4 ] shows a simple-labeling of an /-graph for / = 4, with k = 2 n-vertices having non-empty 
label and fh = 3 + 1 = 4 distinct p-labels and non-empty n-labels. Hence in this example, Lemma 
holds with equality, and the excess is A = 0. 


5.13 
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Definition 5.15. Two simple-labelings of an /-graph are equivalent if there is a permutation tt v 
of {1, 2, 3,...} and a permutation 7r n of {1, 2, 3,...} such that one labeling is the image of the other 
upon applying tt v to all of its p-labels and 7r n to all of its n-labels. (The empty n-label remains 
empty under any such permutation ir n .) For any fixed l, the equivalence classes under this relation 
will be called simple-labeling equivalence classes. 

Motivation for Definition 5.12| of a simple labeling is provided by the following lemma, which 
gives a lower bound for the quantity E[Tr M 1 ]: 


Lemma 5.16. Let M be as in Definition 5.10, and let l >2 be an even integer. Let C denote the 
set of all simple-labeling equivalence classes for an l-graph. For each simple-labeling equivalence 
class L 6 C, let A (C) be its excess, k(C) be the number of n-vertices with non-empty label, and f{C) 
be the number of distinct p-labels. Then, with the convention 0° = 1. 


E[Tr M l ] > h 


p — l 
P 


h — l 


n 


E 

£eC 


X \ A(£) 

h J 


f(C)- l ~ k 2 (C) 




(15) 


Proof. By Definition 5.10, letting ii + \ := i\ for notational convenience, 

t("-“ 2 V + V) 

n ] 


E 


TV M l 


= E 


TV 


p 


E E 


l 


n 

,s=l 


I y(u — a 2 ) 
P 


-w. 


^s^s + 1 


a 

+ ~^ v i s i s +1 
n 

i-\s\ 


E 

SC{1,...,/} 


E 

— »*i = 

E 


ft) 


V 

E 

SC{1,...,1} ii,...,ii=l 

i a ^i s+ iWs£S 


. l+\S\ 

n 2 


P 

p 

h 


)\~ 

; E 

TT u *sv+i 

E 

n ^ps+i 


/ 

. s£S 


s<£S J 


-|S| 


1 ^ 

m 

r 


r -| 

al s l ( 7 ( i/ 

n v isis +1 

E 

n w Lis+i 




UeS 


LsgS J 


E 


E 


E 


SC{1....,/} ii,■(j s :seS)e{l,...,n}l s l 
i s ^i s+ i\/s€S 





- 

( al s l(7(^-« 2 ))^E 

II z i s js z is+ijs 

E 

n ^.*.+1 

,sS S 


-s^S 


i+|S[ 
h 2 


In the fourth line above, we restricted the summation to i s i s+ i Vs £ S, as vu = 0 for each 
i = 1,..., p by Definition |5. 10 

- ftp 


Let us write FL-sS z i*js z i s +\j s = FF =1 FT /= i z ij where Cij is the number of times Zij appears in 
this product, and let us write n s^S w isis+1 nf=i<" FIi<j<i'<p w ii' i ' w i't' > where a iV and b iV are 
the numbers of times wa> and wm appear in this product, respectively. E[F[f =1 FI /=i z tj\ / 0 only 
if each dj is even (possibly zero), in which case this quantity is at least 1. Similarly, note that if 

w = re %e is such that \/2R ew, y/2Im.iv I ^ ) 1), then r and 0 are independent with r 2 ~ 

and 0 ~ Unif [0, 27r). Then E[tc a u! fe ] = E[r a+?, ]E[e l ( a_b ^] for all nonnegative integers a, b, and this is 
0 if a b and at least 1 if a = b > 0. Hence E inr=i Kt riKKiKp^'= o unless an' = bn' 
for each i' > i and an is even (possibly zero) for each 1 < i < p, in which case this quantity is also 
at least 1. 
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The above arguments imply, in particular, that E [fls^s ^m-i] = 0 unless l — |S| is even. As l 
is even by assumption, then |Sj must also be even, in which case = |a|l s l > 0. Hence each term 
of the sum in the above expression for E[Tr M l ] is nonnegative, so a lower bound is obtained if we 
further restrict the summation to i s / i s +i Vs £ {1,..., /} (rather than just Vs € S), i.e. 


E 


TV M‘ 


a E 

SC{1,...,Z} 


E 

hV*2,*2V*3!---,hV*l 


E 

0' s :se5)e{i,...,n}l s l 


I+|S| 
n 2 


i-|S| 



- 

2 |a|l s l( 7 (u-a 2 ))^E 

n Zi sjs z is+ijs 

E 

| w i 3 i s +i 

.s&S 


.s£S 


We identify the combination of these sums as a sum over all (p, h)-simple-labelings of an /-graph. 
Here, the first sum over S is over all choices of the subset of n-vertices having non-empty label. 


The second sum over i\,... ,ii is over all choices of p-labels, with condition (1) in Definition 5.12 
corresponding to the constraints i\ / * 2 ,ii 7 ^ / i\ . The last sum over (j s : s £ S) is 

over all choices of n-labels for the n-vertices that have nonempty label. The product expression 
rises z isjs z i s +ijs then corresponds to a product over all n-vertices with non-empty label and both p- 
vertices immediately preceding and following that n-vertex, and the condition that each Zij appears 


an even number of times corresponds to condition (2) in Definition 5.12 Similarly, the product 
expression ris^s w i s i s +1 corresponds to a product over all n-vertices with empty label, and the 
condition that each wa> appears the same number of times as uvj is precisely condition (3) in 
Definition 5.12 (By restricting the sum to i s 7 ^ f s +i for all s, no diagonal terms wa appear in 
this product.) Applying the bound E[JI se s > 1 whenever this quantity 

is nonzero, 


E 


TrM 


> 


E 

/-graph (p,n)-simple-labelings 


Tl 


l+k 

2 


l-k 

2 




where k = |5| is the number of n -vertices in the simple-labeling with non-empty label. Any 
simple labeling with f distinct p-labels and at most m — f distinct non-empty n-labels has at most 


p< 


(n—m+r)! (p—r)\ — 


> n r ‘ 


labelings in its equivalence class (where we have used 


m — r <1 and f < /). The desired result then follows upon identifying h l A = n 2 




□ 


The remainder of the proof of Proposition 5.2 involves a comparison of the upper bound in (14) 


and the lower bound in (15). The intuition for the comparison is the following: The dominant 


contributions to the sums in (14) and (15) come from labelings with small excess. Focusing on 


labelings with excess 0, if we take any multi-labeling equivalence class C with A (£) = 0 and 
replace the labels of n-vertices having more than one ?r-label with 0 , then it may be shown that we 
obtain a valid simple-labeling equivalence class L with A(£) = 0. For example, the multi-labeling 
of Figure [3] is mapped to the simple labeling of Figure [4] under this procedure. Furthermore, for 
any £ with A(£) = 0, we may show 


E 

C\C maps to C 


A \ad s {Q\ = \ a \\k(C)\r _ 2) 

ii(4(£)!)V2 W ^ 


l-k(C) 

2 


(The arguments that establish these claims are a specialization of our combinatorial lemmas in 
Appendix [ a| to the cases of A = 0 and A = 0.) Hence, this mapping yields an exact correspondence 
between terms in (14) with excess A(£) = 0 and terms in (15) with excess A(£) = 0. 
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As we must consider l x log n to establish a tight bound in spectral norm, we need to also handle 


terms in (14) where A(£) ^ 0. We do so by extending the above mapping to all multi-labeling 


equivalence classes £, in the case a / 0. The properties of this mapping that we will need are 
summarized in the following proposition. 

Proposition 5.17. Suppose a / 0 and l > 2. Let C and C denote the set of all multi-labeling and 
simple labeling equivalence classes of an l-graph, respectively. For C G C, let A(£) be its excess and 
r(£) be the number of distinct p-labels, and for £ E C, let A(£) be its excess, f(£) be the number 
of distinct p-labels, and k{£) be the number of n-vertices with non-empty label. Then there exists a 
map p : C —>• C such that, for some constants C\, C 2 , C 3 , C 4 > 0 depending only on D, 

(1) For all £ € C, r{£) = r{£), 

(2) For all £ € C, A{tp{£)) < C*iA(£), and 

(3) For any £ E C and Aq > 0, 


\ C 2 A 0 


C&^(C) s=l V V ' ' \ 1 1 / 

A(£)=A 0 


2 ^tzMLl l C 3+ c 4 Ao 


(16) 


The proof of this proposition and the explicit construction of the map ip require some detailed 
combinatorial arguments, which we defer to Appendix [Aj Using this result, we may complete the 
proof of Proposition |5.2| in the case a/0. 


Proof of Proposition 5.2 (Case a / 6 ). For any e > 0 and even integer l > 2, 


P[||Q(X)||> (1 + £ )||/l 


a, 1/,7 1|] A 


Tr Q(X ) 1 > ((1 + e)||Ai a , 1/0 


< 


E[TrQ(A) ; ] 
(IT ||/I , a,v,71 


By Lemma 5.9 Dehnition 5.6, and Proposition 5.17 


E[Tr Q{X) 1 ] <n^_] 


’(12 (^)) 12 "\ AiC) /py(C) ' 1 


c&c 


n 


n 


n 


\ a d4C)\ 


l+Dl 

2 


-»E E E 

£ec Ao= rA^)i ce<p~HC) 
1 1 1 A(£)=A 0 


( d s (£)\) 1/2 

{61 + 6 Dl) 12a \ Ao (py{F) 


n 


n 


l a d 3 (£)l 


< n Y^ 


p\r(C) 


£ec 


n 


l+Dl 

2 


E 

MT 1 ! 


(6 1 + 6 Dl) 


»> (W) 1/J 

12o\ A 0 / C 2 A 0 


n 


< nf 3 { l +^ + 1 ) ^ 


CeC 


p\r(C) b(r)/ 2\— 

-) ar (tJ {is - a 2 ) 2 

n/ 


fA |a| fc ( £ )(u-a 2 )™l c 3+c 4 / 

l°l ) 

({ 6 l + 6 Dl) 12a (]f) C2 1 ° 4 


A(£) 
C 1 


V 


n 


where the last line holds for all sufficiently large n if l x logn. Let 


n = 


1 

n c i 


12a / y-\ S 1 O 4 

(6/ + 6 DI) c i 1 
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and let p = L^fJ- Then for all sufficiently large n and l x logn, nl Cs (-)- 1) < n 2 , and also 
(2)d£) < (S) T 'C-^)(1 + |) l (as f(L) < l, p/n -A 7, and p/n —> 7). Then 


, / 1 \ A (£) 

E[TVQ(X)‘]<n 2 (l + |)'^(^j 

£ec V 2 


n ) 


On the other hand, by Lemma 5.16 


, / 1 \ A (£) / z.\ ?{£) _ _ 

(‘-ioeQ (?) |«I W ( 

cec 


o. i-fe(£) 

is-a 2 < E 


TV M‘ 


for all sufficiently large n. Since p/n —>• 7 and l ~ BC± logn if l ~ B logn, Proposition 5.11 implies 
E[TrM^] < pE[||M||*] < p (||//a iI/i7 || (l + §)) Z for all large n. Thus 

( (1 4- £) 2 \ l 

P[||QWII > (l + eJIl/wll] <» ? 1 




(1- I) (l + e) 


(l+-) 2 

Taking l ~ B log n with B > 0 sufficiently large such that B log ^ ^ < —4 (which is possible 

for any sufficiently small e > 0), this implies P[||Q(X)|| > (1 + e)||/z 0jl/|7 ||] < n~ 2 for all large 
n. Then limsup n ||Q(X)|| < (1 + £)||/x a ,i/, 7 || almost surely, and taking e —> 0 concludes the 
proof. □ 


As ||/i ajI/j7 || is continuous in a, is, and 7, Proposition 5.2 in the case a = 0 may be established via 
a continuity argument: 


Proof of Proposition 5.2 (Case a = 0). For any a > 0, let k a (x) = k(x) +ax, and let Q a (X) be the 
matrix as defined in Definition 5.1 for the kernel function k a ■ Then Q a (X) = Q(X) + ^ V(X), where 
V(X) has zero diagonal and equals XX T off of the diagonal. By Proposition 5.2 for the a / 0 
case, established above, limsup^^^ ||Q a (X)|| < \\p a ,(v+a 2 )^\\- By standard results for covariance 
matrices (see e.g. [29j), limsup^^^ ||^P(A)|| < C 7 almost surely under the assumption (Jfij), for a 
constant C 7 > 0. This implies limsup niP _ > . 00 ||Q(X)|| < ||/U a ,(iM-a 2 ), 7 l! ~ aGy for any a > 0, and the 
desired result follows by taking a —> 0. □ 

6. Analyzing the remainder matrices 


To conclude the proof of Theorem 1.7 we analyze in this section the remainder matrices R(X) 


and S(X) of Definition 5.1 


Lemma 6 . 1 . As n,p ^ 00 with p/n —» 7 E (0, 00), ||5(X)|| —> 0 almost surely. 


Proof. No te th at ||5(A)|| < ||5 (X' 
Definition 


5.1 


and Proposition 


4.1 


< pmaxi<y/<j, where || ■ \\p is the Frobenius norm. By 
for any 1 < i,i! < p and a > 0, |sjj/| < n~ 2+a XjLi l a d| with 
probability at least 1 — n -4 , for all large n. Then pmaxi<^/< p \sa>\ < Cpn~ 2 +a with probability 
at least 1 — p 2 n~ 4 . Taking any a < 1/2 yields the desired result. □ 

Definition 6.2. For d > 2, define Rd{X) = ( r^ : 1 < i,i! < p) G M pxp with entries 


ru> = 



d -1 


C x ih x i'ji “ x ) n 


a=2 


i / i' 


i = i. 


Note that R(X) in Definition 5.1 is given by R(X) = J2d=2 a dRd(X). 
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Lemma 6.3. As n,p —> oo with p/n -I 7 £ (0, 00 ), ]|i?d(X)|| —> 0 almost surely for any d > 3. 
Proof. Letting 17 := i\ for notational convenience, note that 

(dX> 

2 ) ,.,-s 


E[Tri? d (X) 6 ] = V 


E 


il ,...,16=1 

*17^*2,*27^3v,*67^*1 


n 

L.s=l 


^s^s +1 


2t „-3(ctfl )y 


where we set 


V d := 


E 


E 


(d!) 


E ^(hj) 


(17) 


* 17 ^* 2 , * 27 ^* 3 , •■■ 1*6 7^*1 


ib 


iiAiiA'-Aia-i 


i =1 

iiAi^A—Aid-i 


Ji vJj_r 


and 


X(i,j) := E 


n (*■ 


d —1 


2 2 


»n 


X isja X is+lja 


(18) 


_s=l \ a =2 y 

If there is some j* E {1,..., n} such that j® = j* for exactly one pair of indices (s, a) E {1,..., 6 } x 
{1 1}, then as i s 7 ^ * s +i, independence of the entries of X implies X(i,j) = 0. Hence, 

we may restrict the sum in (17) to terms where each index j* equals some other index ji,. Then 
the number of distinct indices j® is at most 6 (d — l)/2 = 3d — 3. Furthermore, it is clear that 
|X(i, j)| < C always, for a constant C independent of n and p. 

We now consider several cases for a nonzero term X(i,j), depending on the number of distinct 
indices among {H, • • •,« 6 } : 

Case 1: |{*i,..., ze}| < 4. Letting V d A denote the sum over terms of (17) belonging to this 
case, the above implies V d A < Cp 4 n 3d ~ 3 for a constant C independent of n and p. 

Case 2: |{ii,..., « 6 }| = 5. Then either i s = i s+ 2 or i s = i s+ 3 for some s (where s + 2 and s + 3 
are taken modulo 6 ), with the remaining indices all distinct. Suppose without loss of generality 
that i '2 and is are distinct from each other and from {*i, * 4 ,^ 5 , * 6 }- Let i* = 12 and j* = j\ (which 
exists when d > 3). By the distinctness conditions in ( |l7| ), j* 7 ^ j x a for all a / 2. If furthermore 
j* 7= fa for all a E {1,..., d — 1}, then Xi*j* appears exactly once in (18), so X(i, j) = 0. If j* = jf, 
then Xi*j* appears twice, once as the term x i2 ji and once in the term (x^ . 2 . 2 — 1). The product 


2 — Xi*j *, and as E[xf*^*] = 0 and E [xi*j*] = 0, this also implies X(i, j) = 0. 


of these terms is x%,*x.„.- 
Hence we must have j* = j\ = j% for some a > 2. The same argument applied to i* := is and 
j* := j% shows that we must have j* = j 2 = j 3 , for some a' > 2. Then j\ = = j 3 >, so there 

cannot be exactly 3d —3 distinct indices j* . Then there are at most 3d —4 such distinct indices, and 
letting Vd ,5 denote the sum over terms of (17) belonging to this case, we obtain V d ,s < Cp 3 n 3d ~ A 
for a constant C. 

Case 3: |{ii,..., * 6 }| = 6 . Then all indices i\,..., i§ are distinct. Applying the argument of 
Case 2 , there exists a > 2 such that j\ = j 2 . There exists further a’ > 2 such that j 2 = j 3 ,, a" > 2 
such that j 3 , = ;Q ,, etc., and for each s = 1,..., 6 we obtain some a > 2 such that j\ = j s a . Then 
the number of distinct indices is at most + 1 = 3 d — 5 . Letting V d fi denote the sum over 


terms of (17) belonging to this case, we obtain V d $ < Cp b n 


6„3d-5 


Putting the cases together, 

E [TrR d (Xf] < Cn~^ d+l \V dA + + V dfi ) < Cn 

for a constant C > 0 and all large n and p. Then for any e > 0 

E[Tr R d (X) e ] . C 


-2 


P[||i? d (X)||> £ ]< 


s 6 n 2 


so limsup 


n,p—7oo 


||7?rf(A')|| < e almost surely, and the result follows by taking e —> 0. 


□ 
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Lemma 6.4. Let Rd{X) be as in Definition 6.2, and let R(X) be as in As n,p -A oo with 
p/n A7G (0, oo), || a 2 R 2 {X) — R(A)|| -A 0 almost surely. 

Proof. Let T(X) = CX 2 -R 2 (X) — R(X). Then T(X) has entries 



,/9 n 4-7=1 


r 2 2 


-1 )-(4.-i)-(4,.-i)) 


2 


l = % 


-i) 


% = % 


Thus, excluding the diagonal, T(X) equals 2 YY T where Y = (r/y) € M pxn and yij = — 1. 

Under the assumption (ph, ^||FY T || converges to a finite limit almost surely (see e.g. [29]), so 


n 


\\YY t \\ -a 0. Furthermore, T(X) — 2 YY T is diagonal and T(X) — 


>-YY t 


easily verified by a union bound, implying ||T(Af)|| 
We now conclude the proof of Theorem |1.7[ 


0 . 


0 is 

□ 


Proof of Theorem Recall Definitions |5.1| and |6.2| an d th e decomposit ions K( X) = Q (X) + 

imply 


R(X) + S(X) and R(X) = a dRd(X). Proposition 

lim suPjj^^oq \\K(X) — i?(X)|| < || /ia,i/,7 1| • The conditions of Theorem 
function k(x) as in the proof of Corollary 


5.2 


and Lemmas 


1.5 


1.1 


6.1 


6.3 


and 


6.4 


are verified for the kernel 


m 


Section [ 2 ] Furthermore, K(X) and K(X) := 


K(X) — R(X) have the same limiting empirical spectral distribution, since R(X) has finite rank. 

implies lim inf^^oo ||R'(A')|| > ||/x ajI/j7 || almost surely, and this establishes 


1.1 


Then Theorem 
property (1) of Theorem 1.7 


1.7 


To verify the claim regarding the non-zero eigenvalues of R(X) in property (2) of Theorem 
we compute from (9) TrR(A') = ^^v(X) T l and Ti'R(X ) 2 = ^{(v{X ) T \] 2 + p||u(-Y)|| 2 ). 


If Ai and A 2 are the two non-zero eigenvalues of R(X). then Ai + A 2 = Tri2(X) and A 1 A 2 = 
j((Ai + A 2) 2 — A 2 — A 2 ) = ^((Tr R{X)) 2 — Tri?(X) 2 ), so Ai and A 2 are the roots of the equation 


A 2 - 


02\/2 


n 


v(X) T l)x + ^((v(X) T lf-p\\v(X)f)=0. 


By the law of large numbers, n l v(X) T \ -A 0 and (p/n 2 )\\v(X)\\ 2 -A 7 2 (Ex^- — 1) almost surely. 
Since the roots of a polynomial are continuous in its coefficients, the result follows. □ 


Appendix A. Combinatorial results 


This appendix contains the proofs of Lemmas 5.5, 5.7, and 5.13 used in Section[5j as well as the 
proof of Proposition 5. 17| and the explicit construction of the map (p in that proposition. 


A.l. Proof of Lemmas |5.5| , |5.7[ and 5.13 

ing. 


We restate the lemmas using their original number- 


Lemma 5.13. Suppose a simple-labeling of an l-graph has k n-vertices with non-empty label and 
m total distinct p-labels and distinct non-empty n-labels. Then m < + 1. 

Proof. Let I = {1 ,... ,p} and J = {1,..., n}, and consider an undirected graph G on the vertex 
set IUJ (the disjoint union of / and J with n+p elements, treating elements of I and the elements 
of J as distinct). Let G have an edge between i,i' £ I if there are three consecutive vertices (p, 
n, p) of the /-graph with the labels i, 0, i! or i 1 , 0, i. Let G have an edge between i G / and 
j E J if there are two consecutive vertices of the /-graph such that the p- vertex has label i and the 
n- vertex has label j. The number of vertices of G incident to at least one edge is rh, and G must 
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be connected, so it has at least rh — 1 edges. An edge in G between i,i' £ I corresponds to at least 
two consecutive pairs of p-ver tices in the /-graph having an n-vertex with empty label in between, 
by condition (3) of Definition 


in G between i 6 I and j £ 


5.12 


so the number of such edges is at most . Similarly, an edge 
corresponds to at least two pairs of consecutive n and p-vertices of 
the /-graph such that the n-vertex has non-empty label, by condition (2) of 5.12, so the number of 
such edges is at most ‘ 2 !f . Then rh — 1 < 4^. □ 


Turning now to multi-labelings, for each j £ {1, 2,3,...} and a given multi-labeling, let us denote 
throughout 

Nj := number of appearances of j as an n-label. 

Then the following two lemmas hold: 


Lemma A.l. In any multi-labeling of an l-graph, each j that appears as an n-label has Nj > 2. 


Proof. Suppose that an n-label j appears only once. The two p-vertices preceding and following 
that n-vertex must have distinct labels, say i\ and i-i- by condition (1) of Definition 5.4 Then 


exactly one edge in the /-graph has p-vertex endpoint labeled i\ and n-vertex endpoint having label 
j (and similarly for Z 2 and j), contradicting condition (3) of Definition 5.4 


□ 


Lemma A.2. Suppose a multi-labeling of an l-graph has at most ^ distinct p-labels. If this multi¬ 
labeling has excess A, then 

N j < 6 A - 6 . 

j.Nj> 3 

Consequently, the number of n-vertices having any label j for which Nj >3 is also at most 6 A — 6 . 


Proof. Observe that if m total distinct p-labels and n-labels appear in the labeling, and at most ! 2 
of these are p-labels, then the labeling has at least m — | distinct n-labels. Let c = |{ j : Nj = 2}|. 

implies 2c + 3 (m — ^ — c) < ^l=i d s (where d\,... ,di are the numbers of 


Then Lemma 


A.l 


n-labels on the / n-vertices), so c > 3m — A — Yl s =ids- Then the n-labels in {j : Nj = 2} 
account for at least 6 m — 3/ — 2 ^ s=1 d s of the d s total n-labels, implying that at most 

3/ + 3 EUi d s — 6 m = 6 A — 6 total n-labels remain. This establishes the first claim, and the second 
follows directly from the first. □ 


We will prove many subsequent claims regarding multi-labelings by induction on /. The following 
two lemmas describe the base case of the induction and the basic inductive step. 


Lemma A.3. Suppose 1 = 2 or l = 3. Then for any multi-labeling of the l-graph, all l p-labels are 
distinct, and all l n-vertices have the same tuple of n-labels, up to reordering. 


Proof. That all / p-labels are distinct is a consequence of condition (1) of Definition 5.4 Then by 


conditions (2) and (3) of Definition 5.4 the n-vertices immediately preceding and following each 
p-vertex must have the same tuple of n-labels, up to reordering. □ 


Lemma A.4. In a multi-labeling of an l-graph with l > 4, suppose a p-vertex V is such that its 
p-label appears on no other p-vertices. Let the n-vertex preceding V be U, the p-vertex preceding U 
be T, the n-vertex following V be W, and the p-vertex following W be X. 

(1) If T and X have different p-labels, then the graph obtained by deleting V and W and 
connecting U to X is an (/ — 1)-graph with valid multi-labeling. 

(2) IfT and X have the same p-label, then the graph obtained by deleting U, V, W, and X and 
connecting T to the n-vertex after X is an (/ — 2)-graph with valid multi-labeling. 
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Proof. First consider case (1). As T and X have distinct p-labels, it remains true that no two 
consecutive p-vertices in the (/ — l)-graph have the same p-label, so condition (1) of Definition 
5.4 holds. Condition (2) of Definition 5.4 clearly still holds as well. If V has p-label i and W 
has n-labels (j\ ,... ,jd), then U has n-labels (ji,... ,jd) as well, up to reordering, by conditions 
(2) and (3) of Definition 5.4 and the fact that V is the only p-vertex with label i. Then in the 
(I — l)-graph obtained by deleting V and W, the number of edges with p-vertex endpoint labeled 
i and n-vertex endpoint having label j s for any s = 1 ,..., d is zero, and the number of edges with 
p-vertex endpoint labeled i! and n-vertex endpoint having label j' is the same as in the original 
/-graph for all other pairs (i', j'). Thus condition (3) of Definition 5.4 still holds as well, so the 
(/ — l)-graph has a valid multi-labeling. 

Now consider case (2). X and the p-vertex after X must have different p-labels in the original 



V has p-label i\, T and X have p-label 12 , and W has n-labels (j i,... ,jd ). As in case (1), U must 
also have n-labels (ji, • • ■ ,jd) U P to reordering. Then in the (/ — 2)-graph obtained by deleting U, 
V, W, and X, the number of edges with p-vertex endpoint labeled i\ and n-vertex endpoint having 
label j s for any s = 1 , ,d is zero, the number of edges with p-vertex endpoint labeled i 2 and 
n-vertex endpoint having label j s for any s = 1 ,..., d is two less than in the original /-graph, and 
the number of edges with p-vertex endpoint labeled i 1 and n-vertex endpoint having label j 1 is the 
same as in the original /-graph for all other pairs (i',j'). Hence condition (3) of Definition 5.4 still 
holds as well, so the (/ — 2 )-graph has a valid multi-labeling. □ 

Lemma 5.5. Suppose a multi-labeling of an l-graph has n-labels on the first through 

I th n-vertices, respectively, and suppose that it has m total distinct p-labels and n-labels. Then 

✓ iTTll —i d s .. 

m < — 2 -b 1- 

Proof. We induct on /. For / = 2, a multi-labeling must have d\ = c /2 and m = d\ + 2, and for 
/ = 3, a multi-labeling must have d\ = c /2 = d% and rn = d\ + 3, by Lemma A.3 The result is then 
easily verified in these two cases. 

Suppose by induction that the result holds for / — 2 and / — 1, and consider a multi-labeling of 
an /-graph with / > 4. If each distinct p-label appears at least twice, then there are at most ^ 

E'.= 


distinct p-labels. Lemma 
establishing the result. 


A.l 


implies there are at most 


distinct n-labels, so m < 


l + E' 


Thus, suppose that some p-vertex V has a label that appears exactly once, and let T, U , W, X be 


as in Lemma A.4 If T and X have different p-labels, follow procedure (1) in Lemma A.4 to obtain 


a multi-labeling of an (/ — l)-graph. This multi-labeling now has m — 1 total distinct p-labels and 


n-labels, and so the induction hypothesis implies m — 1 < 


1 +E S —1 d s —d 


+ 1 where d is the number 


of n-labels of the deleted n-vertex W. Hence m < ds — ^±1 + 2 < 


I+E..1 d s 

2 


+ 1 . 


If T and X have the same p-label, follow procedure (2) of Lemma A.4 to obtain a multi-labeling 
of an (/ — 2)-graph. This multi-labeling has between m — d — 1 and m — 1 (inclusive) total distinct 
p-labels and n-labels, where d is the number of n-labels of the deleted n-vertex W. The induction 


hypothesis implies m — d — 1 < 


1- 2+Ei= 


■\ ds 2 d ^ ^ 


so m 


< 


DELi ds 


+ 1. This completes the 

□ 


induction in both cases, establishing the desired result. 

Lemma 5.7. Suppose a multi-labeling of an l-graph has excess A. For each i E {1,2,3,...} and 
j E {1,2,3,...}, let bij be the number of edges in the l-graph such that the p-vertex endpoint is 


labeled i and the n-vertex endpoint has label j in its tuple. Then ^ 




;>2 bij — 12A. 


Proof. We induct on /. For / = 2 or 3, we must have bij = 0 or 2 for all (i,j) by Lemma A.3 and 


A > 0 by Lemma 5.5, so the result holds. 
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Suppose the result holds for / — 2 and / — 1, and consider a multi-labeling of an /-graph with 
l > 4. If each distinct p-label appears at least twice, then there are at most ^ distinct p-labels, so 
Lemma A.2 applies. For any j with Nj = 2, we have bjj = 2 or b 1; j = 0 for all i, by conditions (1) 
and (3) of Definition 5.4 For any j with Nj > 3, we apply the bound Xa-6 >2 bij < 2A Jj. Then 

E 




>2 hj < 2(6A — 6) < 12A by Lemma 


A.2 


Now suppose that some p -vertex V has a p-label appearing exactly once. Consider the (Z — 1)- 

the case of the (Z — l)-graph, it is easily 


In 


> 2 hj <12 


>2 bij is the same as in the original /-graph, so the induction hypothesis implies 
1+ ^ s -, =1 da d + 1 — (m — 1)^ < 12A, where d > 1 is the number of n-labels on 


graph or (Z — 2)-graph obtained by Lemma A.4 
verified that Eij:6, 

the deleted n-vertex W. 

In the case of the (/ — 2)-graph, suppose the deleted n-vertex W (and U) has d n-labels, of 
which d' also appear on an n-vertex different from W and U. If j does not appear on W or U, 
then clearly bij is the same in the (/ — 2)-graph and the original /-graph for all i. If j is one 
of the d — d! n-label values appearing only on W and U, then bij = 0 or 2 in both the (/ — 2)- 
graph and the original /-graph for all i. If j is one of the other d! n-label values appearing on 
W and U, then in deleting U, V , W, and X, we may have reduced b^ by 2 for at most two 
distinct values of i (corresponding to the p-labels of V and X ). This implies that E,-/, -2 bij 
reduces by at most 8 for this j, with the maximal reduction occurring if b tJ = 4 for both of 


these values of i in the original /-graph. Then by the induction hypothesis, E 


%,]-b 


>2 bij 8 d < 


12 


( J- ^ + ^- /s - lds 2d + 1 — (m — 1 — (d — (/')) j as the (l ~ 2)-graph has m—l — {d — d') total distinct 

+ 1 — m — d'^j +8 d! < 12A, so the result holds 


n and p-labels. Then E* y-6, >2 bij A 12 
in this case as well, completing the induction. 


A.2. Construction of the map ip. 

Definition A. 5. In an /-graph with a multi-labeling, an n-vertex is single if it has only one n-label. 
It is a good single if it is single and if its n-label j appears only on single n-vertices. Otherwise, 
it is a bad single. 

Definition A. 6 . In an /-graph with a multi-labeling, a pair (V,V') of distinct (not necessarily 
consecutive) n-vertices is a good pair if the following conditions hold: 

(1) V and V' have the same tuple of n-labels, up to reordering, 

(2) V and V 1 are not single, and 

(3) Nj = 2 for each j appearing as an n-label on V and V' (i.e. this label j appears on no other 
n- vertices). 

If an n-vertex V is not single and not part of any good pair, then V is a bad non-single. 


Thus, every n-vertex is either a good single, a bad single, a bad non-single, or part of a good 
pair. Conditions (1) and (3) of Definition 5.4 require that, if (V,V) is a good pair, then the two 
(distinct) p-labels of the p -vertices preceding and following V are the same as those of the p -vertices 
preceding and following V' (but not necessarily in the same order). 


Definition A. 7. Suppose {V. V') is a good pair of n-vertices. Let the p-vertices preceding and 
following V be Z7 and W, respectively, and let the p-vertices preceding and following V' be U' and 
W', respectively. Then the good pair (V, V') is proper if U has the same label as W' and U' has 
the same label as W, and it is improper if Z7 has the same label as tT and W has the same label 
as W'. 


Definition A. 8 . The label-simplifying map is the map from (p, n)-multi-labelings of an /-graph 
to (p. n + l)-simple-labelings of an /-graph, defined by the following procedure: 
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(1) While there exists an improper good pair of n-vertices (V, V'), iterate the following: Let W 
be the p -vertex following V and W’ be the p -vertex following V', and reverse the sequence 
of vertices starting at W and ending at W' (together with their labels). 

(2) For each n-vertex in a good pair, relabel it with the empty label. 

(3) For each n-vertex that is a bad single or a bad non-single, relabel it with the single label 
n + 1. 


Remark A.9. In the case where there are multiple improper good pairs in step (1) of this proce¬ 
dure, it will not be important for our later arguments in which order the pairs (V, V') are selected 
and which vertex we choose as V and which as V'. For concreteness, we may always select {V, V'} 
to be the improper good pair whose sorted n-label-tuple is smallest lexicographically, and we may 
take V to come before V' in the /-graph cycle. 


Lemma A. 10. The following are true for the label-simplifying map in Definition \A.fy 

(1) Step (1) of the procedure in Definition \A.£\ always terminates in a valid (p,n)-multi-labeling 
with no improper good pairs. 

(2) The image of any (p,n)-multi-labeling under the map is a valid ( p,n + 1)-simple-labeling. 

(3) If two multi-labelings are equivalent, then their image simple-labelings are also equivalent. 


Proof. Clearly each reversal in step (1) of the procedure preserves condition (2) of Definition 5.4 


as well as the number of good pairs and n-labels of each good pair. As W and W' have the same 


p-label because (V, V') is improper, it also preserves conditions (1) and (3) of Definition 5.4, so 
the resulting labeling is still a valid (p, n)-multi-labeling. Each time this reversal is performed, V 
and V' become consecutive n -vertices in the /-graph, and the pair (V, V) becomes a proper good 
pair. As V and V' are consecutive, they must remain consecutive under each subsequent reversal, 
so their properness is preserved. Hence the procedure must terminate after a number of iterations 
at most the total number of good pairs in the multi-labeling, and the final multi-labeling is such 
that all good pairs are proper. This establishes (1). 

To prove (2), note that the image labeling has either one n -label or the empty label for each 
n-vertex. Condition (1) of Definition 5.12 holds for the image labeling by condition (1) of Definition 


5.4| as the p-labels are preserved. As all good pairs in the multi-labeling obtained after applying 
step (1) of the procedure are proper, and step (2) of the procedure maps their labels to the empty 
label, condition (3) of Definition 5.12 holds for the image labeling. Finally, note that if j is an 


n-label appearing on good single vertices in the multi-labeling, then condition (2) of Definition 5.12 


holds in the image labeling for this j and all p-labels i by condition (3) in Definition 5.4 For the 
new n-label n + 1 created in step (3) of the map, note that for each i E {1, 2,3,...} there must 
be an even number of edges in the /-graph with p-endpoint labeled i. Of these, there must be an 
even number with n-endpoint j for any good single label j, by the above argument, and there must 
also be an even number with n-endpoint belonging to a good pair since these edges must come in 
pairs. Hence the number of remaining edges adjacent to any p-vertex with label i must also be 
even. These are precisely the edges with p-endpoint labeled i and n-endpoint labeled n + 1 in the 
image labeling, so condition (2) of Definition 5.12 holds for the new n-label n + 1 and all p-labels 
i as well. Hence the image labeling is a valid (p,n + l)-simple-labeling, establishing (2). 

(3) is evident, as equivalent multi-labelings have the same proper and improper good pairs of 
n-vertices and the same good single n-vertices. □ 


Definition A.11. Let C and C be the set of all multi-labeling equivalence classes and simple- 
labeling equivalence classes, respectively, of an /-graph. For C E C and any multi-label ing i n C, 


let £ E C contain its image simple-labeling under the label-simplifying map of Definition 
define ip : C —> C by tp(C) = C. 


A.8 


and 
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A.3. Verification of Proposition 5.17, properties (1) and (2). For the map <p of Definition 


A. 11 property (1) of Proposition 5.17 is evident as the p-labels are preserved. We verify property 


( 2 ) by bounding the number of bad non-single n-vertices. 

For each pair i, i' E {1, 2,3,...} with i < i ', and for a given multi-labeling, let us denote 

P,; j/ := number of appearances of i,i' as the p-labels of two consecutive p-vertices (in some order). 

Lemma A.12. Suppose a multi-labeling of an l-graph has excess A. Then 


E 


Pi v < 42A. 


Proof. We induct on l. For l = 2 and 3, Piy = 0 or 1 for all pairs i < i 1 , and A > 0 by Lemma 5.5 
so the result holds. 

Suppose by induction that the result holds for l — 2 and l — 1, and consider a multi-labeling of an 
/-graph with l > 4. First suppose each distinct p-label appears at least twice, so there are at most 
2 distinct p-labels. If an n -label j is such that Nj = 2 , then the pairs of p-vertices before and after 
the two n-vertices with label j must have the same pairs of p-labels, by conditions (1) and (3) of 
Definition 5.4 Thus the number of pairs i < i! with Pj ,/ = 1 is at most the number of n-vertices 


for which Nj > 3 for all of its n-labels j. This is at most 6 A by Lemma A.2 On the other hand, 


the number of distinct p-labels is at most one more than the number of distinct pairs of consecutive 
p-labels. (This is easily seen by considering the undirected graph with vertices {1,... ,p} having an 
edge between i, i' if and only if some c onse cutive pair of p-vertices have labels i and i', and noting 

EU 


that this graph is connected.) Lemma 


A.l 


implies there are at most 


distinct n-labels, and 


v-'t d, i l 

hence at least m — a — 1 = 2 ~ A distinct pairs i < i! of consecutive p-labels. At least f — 7A 

of these have Piy >2. If c of these have P t p = 2, then 2c + 3 (^ — 7A — c) < Z, so c > | — 21A. 
These account for at least l — 42A pairs of consecutive p-vertices, implying that at most 42A pairs 
of consecutive p-vertices remain. This establishes the result in this case. 

Now suppose that there is some p-vertex V whose p-label appears only once. Consider the 
(Z — l)-graph or (Z — 2)-graph obtained by Lemma 
the same in this graph as in the original Z-graph, 


A.4 


It is easily verified that 


; l\.l' IS 


because if Pi y > 3 in the original Z-graph, then 


neither i nor i! can be the p-label of V. On the other hand, our proof of Lemma 5.5 verified that 
this (Z — l)-graph or (Z — 2)-graph has excess at most that of the original Z-graph, so the desired 
result follows from the induction hypothesis. □ 

The next lemma bounds the number of bad non-single n-vertices, i.e. it shows that in any multi¬ 
labeling with small excess A, most of the non-single n-vertices must belong to a good pair. 

Lemma A. 13. Suppose a multi-labeling of an l-graph has excess A and k single n-vertices. Then 
there are at least — 48A good pairs of n-vertices. 

Proof. Let m be the number of distinct n and p-labels and let d \,..., di be the numbers of n-labels 
on the Z n-vertices. We induct on Z. If Z = 2, then Lemma A.3| implies d\ = d 2 , m = d\ + 2, and 
A = 0. If d\ = c ?2 = 1, then k = 2 and there are no good pairs, and if d\ = d%> 2, then k = 0 and 
there is one good pair. Hence the result holds. If Z = 3, then Lemma |A.3| implies d\ = d 2 = d%, 
m = hi + 3, and A = If d,\ = c ?2 = d-s = 1, then k = 3, A = 0, and there are no good pairs. 

If d\ = d 2 = dz> 2, then k = 0, A > and there are still no good pairs. In either case, the result 
also holds. 

Consider Z > 4, and assume by induction that the result holds for Z — 2 and Z — 1. First suppose 
each distinct p-label appears at least twice, so there are at most | distinct p-labels. By Lemma 


A.2 there are at most 6 A n-vertices with some n-label j such that Nj > 3, so there are at least 
l — k — 6 A non-single n-vertices for which each of its n-labels j has Nj = 2. Let V be one such 
n-vertex. We consider three cases: 
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Case 1: V has two n-labels j\ and j 2 that appear on two different other re-vertices W\ and W 2 ■ 


Then Definition 5.4 implies that the three pairs of consecutive p-vertices around V, W±, and W 2 
must have the same pair of p-labels. By Lemma A. 12 there are at most 42A such re-vertices V. 

Case 2: All re-labels of V appear on a single other re-vertex W \, but W\ has some additional 
re-label j not appearing on V. Then either all such additional re-labels j have Nj > 3, or there is 
some such j with Nj = 2. In the former case, the number of such vertices W\ is at most 6A by 
Lemma A.2 As V is the unique re-vertex sharing an re-label j with W\ for which Nj = 2, this 


implies the number of such vertices V is also at most 6A. In the latter case, j appears on a vertex 
IT2 distinct from V and W\. Then the three pairs of p-vertices around V, W\. and W 2 must have 


the same pair of p-labels, and by Lemma A. 12 the number of such vertices V is at most 42A. Hence 
the number of re-vertices V belonging to this case is at most 48A 

Case 3: V forms a good pair with some other vertex V' . By the bounds in cases 1 and 2, there 
are at least l — k — 96A such vertices V, hence at least — 48A good pairs, and the result holds. 

Now suppose there is some p-vertex V whose p-label appears only once. Let T, U, W, X be as 
in Lemma A.4 and recall that U and W have the same re-labels up to reordering. Consider four 
cases: 

Case 1: T 


and X have different p-labels, and U and W are single. Lemma A.4 yields an 
(/ — l)-graph with k — 1 single re-vertices, d s — 1 total re-labels, and m — 1 total distinct p- 

and re-labels. By the induction hypothesis, this (/ — l)-graph has at least 


— 48 (j ——D +1 _ (m _ 1) = LT_ 48A 


good pairs, which are also good pairs in the /-graph. 

Case 2: T and X have different p-labels, and U and W each have d > 2 ?r-labels. Lemma A. 4 


yields an (/ — l)-graph with k single re-vertices, ^l=i d s — d total re-labels, and m — 1 distinct p- 
and re-labels. By the induction hypothesis, this (l — l)-graph has at least 


(i-i)-fc _ 48 - i) + (eU - <0 


+ 1 — (m — 1) 


> - - 48A + 1 


good pairs. It can have at most one more good pair than the original /-graph (which occurs if W 
has a tuple of re-labels appearing on exactly three different re-vertices in the /-graph). 

Case 3: T and X have the same p-label, and U and W are single. Lemma |A.4 yields an 
(/ — 2)-graph with k — 2 single re-vertices, Y^s= 1 d s — 2 total re-labels, and either m — 2 distinct p- 
and re-labels if U and W have an re-label appearing only those two times, or rre — 1 distinct p- and 
re-labels otherwise. Supposing the former, this (/ — 2)-graph has at least 


( i-2)-(*-2) _ 48 ^- 2 ) +( a. 1 4.- 2 ) +1 _ (m . 2) j = Dii_ 48A 

good pairs, and it has the same number of good pairs as the original /-graph. Supposing the latter, 
this (/ — 2)-graph has at least 

( l -2)-(*-2) - 48 / (l-2) + (El-,«.-2) +1 _ (m _A > I z *_ 48A + 1 


good pairs, and it can have at most one more good pair than the original /-graph (which occurs if 
the (/ — 2)-graph has a good pair containing the ?r-label of the removed vertices U and W). 

Case 4: T and X have the same p-label, and U and W each have d > 2 re-labels. Lemma A. 4 


yields an (/ — 2)-graph with k single re-vertices, d s — 2d total re-labels, and between m — d — 1 
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and m — 1 (inclusive) distinct p- and n-labels. If it has exactly m — d — 1 distinct p- and n-labels, 
then we must have removed a good pair, and the (Z — 2 )-graph has at least 


(1-2)- k 


-48 


2 ) + (£*=i d s 2d) 


+ 1 — (m — d — 1 ) 1 = 


l-k 


- 48A - 1 


good pairs. If, instead, the (l — 2)-graph has m — c — 1 distinct p- and n-labels for 0 < c < d, then 
U and W cannot be a good pair in the original /-graph as they have d — c n-labels j for which 
Nj > 3, and the (Z — 2 )-graph can have at most d — c more good pairs than the /-graph, one for 
each such j. The (/ — 2)-graph has at least 


(1-2)- k 


-48 


0 2 ) + CCs=i d s 2d) 


+ 1 — (m — c — 1 ) 1 > 


l-k 


— 48A + d — c 


good pairs. In all cases, we establish that the /-graph has at least ] -y- — 48A good pairs, completing 
the induction. □ 


Proof of Proposition 5.11 , property (2). Let £ E C be any multi-labeling equivalence class. Let 
<p(£) have k n-vertices with non-empty label. This means £ has k n-vertices that do not belong 
to a good pair. These vertices have at least k total n-labels in £, implying that there are at 
most ^i=i d s — k total n-labels on the good pair vertices. These good pair vertices account for 

at most s — distinct n-labels in £, and these are mapped to the empty label under the 

label-simplifying map. Furthermore, by Lemma A.13 there are at most 96A(£) bad non-single 
n-vertices, and these have at most 96DA(C) additional distinct n-labels that are mapped to the 
new n-label n +1. Any bad single n-vertex has an n-label that is the same as one of these 96DA(C) 
distinct n-labels (otherwise it is a good single by definition), and the n-label of any good single 
n-vertex is preserved under the label-simplifying map. Hence, if m is the number of total distinct 
p- and n-labels in £ and m is the number of total distinct p-labels and non-empty ?r-labels in (p(C), 


then m > m — 
( 2 ) holds. 




— 96£>A(£), so A (<p(C)) = + 1 — rh < (96 D + 1 )A(£). Hence property 

□ 


A.4. Verification of Proposition 5.17, property (3). Recall that we order the vertices of an 
/-graph according to a cyclic traversal starting from a (arbitrary) p-vertex. 


Definition A. 14. The canonical simple labeling in a simple labeling equivalence class £ is the 
one in which each new p-vertex label that appears in the cyclic traversal is i, and each j th new 
non-empty n-vertex label is j. 

The canonical multi-labeling in a multi-labeling equivalence class £ is the one in which each 
z th new p-vertex label is i and each j th new n-vertex label is j, with the new n-vertex labels in the 
label-tuple for each n-vertex appearing in sorted order. 

Each £ has a unique canonical simple-labeling, which is an (/, Z)-simple labeling, and each £ has 
a unique canonical multi-labeling, which is an (/, D/)-multi-labeling. 

For each £ E C and Ao > 0, property (3) of Proposition 
cardinality of the set 

S(A 0 ,£) := p-\C) n {£ : A(£) = A 0 }. 

We describe a series of non-determined steps by which the mapping <p may be “inverted” to obtain 
the canonical multi-labeling L of any £ e <p -1 (£), given £: 

(1) Choose a non-empty n-label value appearing in £ to be “n+1”, or assume there is no such 
label. (The n-vertices with empty label will be the good pairs, and the remaining n-vertices 
with label different from “n+1” will be the good singles.) 


5.17 


is a bound on a certain weighted 
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(2) Choose a subset S of n-vertices with label “n+1” to be the bad non-singles in L. (The 
remaining n-vertices with label “n+1” will be the bad singles.) 

(3) For each n-vertex in S, choose the size of its n-label tuple in L to be between 2 and D 
(inclusive), and pick n-labels from {1,..., Dl} for that tuple. 

(4) For each n-vertex with label “n+1” not in S, pick a single value in {1,..., Dl} for its n-label 
in L. 

(5) For all n-vertices with empty label in £, pair them up into good pairs for L. 

(6) For each good pair, choose the size of its n-label tuple in L to be between 2 and D (inclusive), 
and choose a permutation of the second n-label tuple of the pair that matches the first. 

(7) Let Q be the set of good pairs (V, V') that are consecutive n-vertices in the /-graph and 
such that the p-label (in C) of the p-vertex between them appears at least twice. Choose 
an ordered subset of Q. For each (V, V') in this subset, if W is the p-vertex between V and 
V', choose some other p-vertex W' having the same p-label as W, and reverse the sequence 
of vertices from W to W' or from W' to W. 

(8) Choose p-labels for L such that the resulting labeling is canonical and two p-vertices have the 
same label if and only if they do in C. Choose the remaining n-labels for L (corresponding to 
the good pairs and good singles) such that the resulting labeling is canonical, the properties 
of Definitions |A.5 and |A.6| are satisfied, and two good single vertices have the same n-label 
if and only if they do in C. 

These steps are non-determined in the sense that each step may be performed in multiple ways, 
yielding many possible output multi-labelings L. They “invert” ip in the following sense: 

Lemma A.15. For any L E <p _1 (£), the canonical multi-labeling L of C, is a possible output of the 
above procedure. 

Proof. Let L* denote the (/, Z)/)-multi-labeling obtained by applying step (1) of the label-simplifying 


map in Definition A .8 to L. (It is an (/, D/)-multi-labeling by Lemma A.10 


L may be obtained by the above procedures as follows: Perform steps (1) and (2) to correctly 
partition the ?r-vertices into the good pair, good single, bad single, and bad non-single n-vertices 
of L*. Perform steps (3) and (4) to recover the n-labels in L* of the bad single and bad non-single 
n-vertices. Perform steps (5) and ( 6 ) to correctly identify the good pairs of L* and the permutation 
that maps the label-tuple of the second vertex to that of the first vertex in each pair. Perform step 
(7) to invert the reversals that mapped L to L* (in the reverse order of how they were applied in the 
label-simplifying map): This is possible because each reversal in step (1) of the label-simplifying 
map causes an additional good pair (V, V') of n-vertices to become consecutive in the /-graph, with 
the p-vertex between them having p-label appearing at least twice, and these three vertices remain 
consecutive after each subsequent reversal. Finally, perform step ( 8 ) to recover the p-labels and 
the good single and good pair n-labels of L, which is possible because (by assumption) L is a valid 
canonical multi-labeling. □ 

To obtain the desired weighted cardinality bound for <S(Ao,£), we bound the number of ways 
each of the above 8 steps may be performed such that the final output L is the canonical multi¬ 
labeling for some £ E <S(Ao, C). The bounds for all but steps (4) and (7) follow from our preceding 
combinatorial estimates. The following simple lemma will yield a bound for step (7): 

Lemma A. 16. Suppose a multi-labeling of an l-graph has excess A. Then there are at most 2A 
good pairs of n-vertices such that the two vertices in the pair are consecutive in the l-graph cycle 
and the p-label of the p-vertex between them appears at least twice in the labeling. 


Proof. Call a p-vertex “sandwiched” if it is between two consecutive n-vertices that form a good 
pair. Let i be a p-label appearing on a total of b > 2 p-vertices, of which c > 1 are sandwiched. If 
b > c, then change the c appearances of i on the sandwiched p-vertices to c new p-labels not yet 
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appearing in the labeling. Otherwise if b = c (so c > 2), then change c — 1 appearances of i on the 
sandwiched p- vertices to c — 1 new p-labels not yet appearing in the labeling. Do this for every such 
i. Note that changing the p-label of any sandwiched p- vertex does not violate any of the conditions 
of Definition |5.4| , so the resulting labeling is still a valid multi-labeling. If x is the number of good 
pairs originally satisfying the condition of the lemma, then we have added at least 5 new p-labels 


to the labeling. Hence Lemma 5.5 implies m + § < 


l + J2s=l d s 


+ 1, so x < 2A. 


□ 


The remaining challenge is to bound the number of ways of performing step (4). This bound 
is not straightforward because the number of bad singles is not necessarily small when A is small. 
We instead show that the number of bad singles that we may “freely label” is small: 


Definition A. 17. In a multi-labeling of an /-graph, i e {1, 2, 3,...} is a connector if it appears 
as a p-label and, among all n-vertices that are adjacent to any p-vertex with label i, exactly two 
are bad singles and none are bad non-singles; these two bad singles are connected. A sequence of 
bad singles W\, ..., W a is a connected cycle if ILj is connected to IL 2 , W 2 is connected to W 3 , 
etc., and W a is connected to W\. 


Note that “connector” refers to a label 2 , not to any specific p-vertex having 2 as its label, and 
two “connected” bad singles are adjacent to p- vertices having the connector label i, but these p- 
vertices may be distinct in the /-graph. Each bad single n-vertex may be connected to at most two 
other bad single n-vertices (where the connectors are the p-labels of its two adjacent p- vertices), 
and hence this notion of connectedness partitions the set of bad single n-vertices into connected 
components that are either individual vertices, linear chains, or cycles. 

Motivation for this definition comes from the observation that if two bad single n-vertices are 


and the fact that n-labels appearing on good singles and good pairs must be distinct from those 
appearing on the remaining n-vertices. 


connected, then they must have the same n-label, as follows from condition (3) of Definition 5.4 


Lemma A. 18. Suppose a multi-labeling of an l-graph has excess A and k single n-vertices, of 
which k' are good single and k — k' are bad single. Then at least k — k' — (288 D + 2)A distinct 
p-labels are connectors, and there are at most (192D + 1)A connected cycles of bad single n-vertices. 


Proof. Suppose the multi-labeling is a (p, n)-multi-labeling. Construct an undirected multi-graph 
G with vertex set {1,... ,p}, where each edge of G has one label in {1,..., n}, as follows: For each 
n-vertex V in the /-graph and each n-label j of V, if V is preceded and followed by p- vertices with 
labels i\ and i 2 , then add an edge i\ ~ *2 hi G with label j. (Thus G has ^l=i total edges.) 
Condition (3) of Definition 5.4 implies for any j, each vertex of G has even degree in the sub-graph 
consisting of only edges with label j. 

We will sequentially remove edges of G corresponding to good pairs and good singles, until 
only edges corresponding to bad singles and bad non-singles remain. At any stage of this removal 
process, let us call a vertex of G “active” if there is at least one edge still adjacent to that vertex. 
Let us define a “component” as the set of active vertices that may be reached by traversing the 
remaining edges of G from a particular active vertex. (Hence a component of G is a connected 
component, in the standard sense, that contains at least two vertices.) We will track the quantity 

M = ^{active vertices} + ^{distinct edge labels} — ^{components}. 


Initially, G has m active vertices plus distinct edge labels (where m is the number of distinct 
n- and p- vertices of the /-graph), and one component, so M = m — 1. Let us remove the edges of 
G corresponding to good pairs. If an n-vertex of a good pair has d n-labels, then the good pair 
corresponds to 2 d edges between a single pair of vertices in G whose edge labels do not appear 
elsewhere in G. Removing these 2d edges removes d distinct edge labels, and if this also changes 
the connectivity structure of G, then either ^{components} increases by 1, ^{active vertices} 
decreases by 1, or ^{components} decreases by 1 and ^{active vertices} decreases by 2. In all 
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cases, M decreases by at most d + 1. Then after removing all edges of G corresponding to good 
M > m — 1 — ^ ^~' s=1 2 da — (^) = k — A, as there are at most — a=1 2 ds k distinct n-labels 


pairs, lvl > 


for the good pairs and at most 4^ good pairs. 

Let us now remove the edges of G corresponding to good singles. Let j be an re-label of a good 
single, and consider removing the edges of G with label j one at a time. As each vertex of G has 
even degree in the subgraph of edges with label j, when the first such edge is removed, the number 
of components and active vertices cannot change. Subsequently, the removal of each additional 
edge might increase #{components} — ^{active vertices} by 1 upon considering the same three 
cases as above. When the last such edge is removed, there are no longer any edges with label j by 
the definition of a good single, so ^{distinct edge labels} decreases by 1. Hence removing all edges 
with label j decreases M by at most the number of such edges, and M > k — k' — A after removing 
the edges corresponding to all k! good singles. 

Call the resulting graph G'. Every vertex of G' still has even degree in the subgraph of edges with 
label j, for any j. In particular, every active vertex of G' has degree at least two. By Definition 
E {1,... ,p} is a connector if and only if i has degree exactly two in G' , in which case the 


A.17 


edges incident to i in G' must have the same label j. and the n-vertices with label j in the /-graph 
are the bad singles connected by i. A connected cycle of bad singles corresponds to the edges of a 
cycle of (necessarily distinct) vertices in G' with degree exactly two. 

The number of distinct edge labels in G' equals the number of distinct re-labels in the /-graph 
appearing on bad non-singles (as any re-label appearing on a bad single also appears on some bad 
non-single). By Lemma A.13 this is at most 96DA. Hence ^{active vertices} — ^{components} > 
k — k' — (96 D + 1) A for G'. The number of total edges in G' is at most k — k' + 961? A, with k — k' of 
them corresponding to bad singles and at most 96Z?A corresponding to bad non-singles. Then the 
total vertex degree of G' is at most 2{k — k' + 96D A). As each active vertex in G' has degree at least 
two, this implies ^{active vertices} < k—k'+9QDA. Then ^{components} < (192D+1)A, so there 
are at most (192D + 1)A connected cycles of bad singles. Furthermore, if there are x connectors 
(i.e. active vertices with degree exactly two), then since there are at least k — k' — (96 D +1)A active 
vertices, 2x + 4 (k — k! — (96 D + 1)A — x) < 2{k — k' + 96DA), so x > k — k! — (288 D + 2)A. □ 


Proof of Proposition 5.11, property (3). Let C denote a positive constant that may depend on D 
and that may change from instance to instance. Fix Ao > 0 and C. We upper bound the number 
of ways in which steps (l)-( 8 ) of the inversion procedure may be performed, such that the resulting 
multi-labeling L is canonical for some C E 5(Ao, £): 

There are at most / + 1 ways of performing step (1). 

By Lemma A.13 to yield L with excess Ao, there can be at most C Ao bad non-single re-vertices, 
and hence we must take \S\ < CAo in step ( 2 ). 

To perform step (3), for each vertex in S, we may first choose the number of re-labels d between 
2 and D, and then there are at most ( Dl) d ways of choosing the re-labels for that vertex. 

For step (4), suppose k! good single and k — k' bad single re-vertices were identified in steps (1) 
and (2). By Lemma A. 18 there are at least k — k' — CAo connectors, and any two connected bad 
single re-vertices must be given the same ?r-label. (The p-labels of L are known and are preserved 
in L , so after steps ( 1 ) and ( 2 ) we know which labels are connectors and which bad singles must be 
connected in L .) Going through the connectors one-by-one, each successive connector constrains 
the re-label of one more bad single re-vertex, unless that connector closes a connected cycle. But as 
there are at most CAo connected cycles by Lemma A.18[ the number of bad single re-vertices that 
we can freely label at most CAq. Then there are at most (Dl) CA ° ways to perform step (4). 

For step (5), recall that the pairs of p- vertices surrounding the two re-vertices of a good pair must 
have the same pair of p-labels. By Lemma A. 12 for all but at most C Aq of the re-vertices with 
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empty label, this pairing is uniquely determined, so there are at most (CAo) c ' Ao ways of performing 
step (5). 

For step ( 6 ), there are (l — k(C ))/2 good pairs, and for each pair we may choose the number of 
n-labels d between 2 and D and then one of d\ permutations. 

Lemma A.16| shows that \Q\ < 2Ao for step (7). For each element that we add to the ordered 
subset of Q, there are at most 2Ao choices for this element and at most 21 ways of choosing W' and 
which half of the cycle to reverse, or we may choose to not add any more elements. We make such 
a choice at most 2 Ao times, so there are at most (4Ao£ + 1 ) 2A ° ways of performing step (7). 

Finally, there is at most one way to perform step ( 8 ), as the labels on the good single and good 
pair n -vertices are distinct from those on the bad single and bad non-single n- vertices, and each 
new ?r-label and p-label has a unique choice to make L canonical. 

We may incorporate the product nU |ad s (£)|/(c 4(£)!) 1//2 on the left side of ( |l 6 [ ) into the car¬ 
dinality count by noting that this product contributes |ad|/(d !) 1//2 for each vertex in S having d 
n-labels, a 2 d /d\ for each good pair having d n-labels per vertex of the pair, and |ai| for each of the 
k(C) — |S'! single vertices in L. Combining the above bounds then yields 

\S\ 


l«d s (£)| 

A(£)=A 0 


D 


e n 


<(/ + !) 


S 


Ew 




\d=2 


(d !) 1 / 2 j 


(DI) ca "{C Ao) 


CA 0 



l-k(C) 
2 


(4A 0 Z + l) 2 Ao |ai|^ HS| 


< (Z + 1)(C7) CA °M^V - a 2 ) 1 ^ 


|S| 


E 

s 


I—|S| 


(Diy 


\Q'd\ 


{d\y / 2 


where denotes the sum over all possible sets S selected by step ( 2 ), and the second line applies 


Aq < Cl and y2 d= 


|-|S| 


a 2 = n-a 2 . 




l«dl 

( d !) 1 / 2 


As \S\ < CAo, this implies by Cauchy-Schwarz 

V \s\ 


<(Cl) 


C A 0 



< (Clf Ao ( fA 


v^y A ° 


The sum is over at most l CA ° possible sets 5, so this verifies condition (3) of the proposition upon 
noting that ( Cl) CA ° < /E 3 +C 4 A 0 £ or some constants C 3 , C 4 > 0 and all l > 2. □ 


Appendix B. Moment bound for a deformed GUE matrix 


In this appendix, we prove Proposition 5.11 Recall Definition 5.10 of M, W, V, and Z, which 
implicitly depend on p and n. Throughout this section, we will use p and n in place of p and n. 


Lemma B.l. Suppose n,p —> 00 with p/n —> 7 . Then \\M\\ -A ||/z a)1/)7 || almost surely 

Proof. Recall M = \j ^ v ~ p a ^ -W + ^V, where V = ZZ T — D and D = diag(||Zj|| 2 ). The empirical 

spectral distribution of p t ZZ T converges weakly almost surely to gMPj- By a chi-squared tail 

bound and a union bound, \\pD — Id || —> 0, so the empirical spectral distribution of ^V converges 

weakly almost surely to a(p, mp i7 — 1). Furthermore, the maximal distance between an eigenvalue 

of ^V and the support of a(p mp , 7 — 1) converges to 0 almost surely by the results of [BT] and [;2J. 

Let V = OAO T where O is the real orthogonal matrix that diagonalizes V. Then the spectrum 

of M is the same as that of \/ 7< ~ v ~ a ^ O t WO + £A, and O t WO is still distributed as the GUE. 

V P n 
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Conditional on V, the above arguments and Proposition 8.1 of [16] imply 11 \ p °— ^O t WO + 
1| almost surely. As this convergence holds almost surely in V, it holds unconditionally 

□ 


a 

rP 1 !! ^ 
as well. 


Lemma B.2. Suppose n,p -A oo with p/n -A 7 , let l := l(n) be such that l(n)/n -A 0, and 
let B n be any event. Then there exist positive constants C := C a ,v, 7 and c := c 0)i/> 7 such that 
E[||M||*l{£ n }] < C l P[B n ) + e~ cn for all large n. 


Proof. Note 


\\M\\ < 


/ y(u — a 2 ) 


+ -—-|| ZZ 1 || + -— max ||Zj|| 9 . 

n 1 <i<p 


p n 

Applying standard tail bounds (e.g. Corollary 2.3.5 of [M] , Corollary 5.35 of [56], and Lemma 1 of 
[38]), there exist constants C, £ > 0 depending on 0,177 such that, for all t > C and sufficiently 
large n, P[||M|| > t] < e~ £tn . Then we may write 


E 


roc r 

] + / P ||M|| Z > t 

J c l L 

roo 


\\M\\ l l{B n }\ =E [\\M\\ l l{B n }l{\\M\\ < C}\ +E \\M\\ l l{B n }l{\\M\\ > C} 

roc r 1 

< C l P\Bn] + I p l|M|| z > t dt 
= C l 

< c l 

< c l 


= c l 


r 00 

,] + / P[||M|| > s] • Is'-'ds 

Jc 

roo 

e -esn+(l-l)\ogs^ s 
e -{en-l)s ds 
„-{en-l)C 


1 } + l 
i] + l 
1] + 


'C 


'C 

l 


en — l 

for all large n. As l = o(n), the result follows upon setting c = Ce/2 
Lemma B.3. Suppose n,p —> 00 with p/n -A 7 . Then E[||M||] -A \\p 
Proof. Lemma 


□ 


a, is ,71 


B.l 


and Fatou’s lemma imply liminf E[||M||] > ||^ a ,i/, 7 ll- For any e > 0, let B n = 

{\\M\\ > Wpa^Ml + £ }' Then 

E[||M||]=E[||M||l{^}]+E[||M||l{B n }] < \\p a ,^\\ + e + E[||M||l{B n }]. 

Lemma B.l implies P[B ra ] -A 0, so Lemma B.2 (with l = 1) implies E[||M||!{£>„}] A 0 as well. 
Then E[||M||] < H/ia^H + 2e for all large n, and the result follows by taking e A 0. □ 

Lemma B.4. Suppose F : M. d —> M is L-Lipschitz on a set G C M fc , i.e. IF’(x) — F(y)\ < L\\x — y \\2 
for all x,y € G. Let f ~ iV(0, Id). Then there exists a function F : -A M such that F(x) = F(x) 

for all x E G, \F{x) — F(y)\ < L\\x — y W 2 for all 1 , 1/6 and, for all A > 0, 

P[F(0 -EF(0 > A + |EF(0 -EF(0| and f e G] < e~FF. 

Proof. Let F(x) = mi x i e G(F(x') + L\\x — a: 7 H 2 )- Note that if x € G, then F{x) < F(x') + L\\x — x '\\2 
for all x 1 G G, so F(x) = F(x). Also, for any x,y 6 and s > 0, there exists x 1 € G such that 
F{x) > Fix') + L\\x — H 2 — £• Then by definition, F(y) < F(x') + L\\y — x'\\ 2 , so F(y) — F(x) < 
L\\y — x '\\2 — L\\x — x '\\2 + £ < L\\x — y H 2 + e. Similarly, F(x) — F{y) < L\\x — y H 2 + e. This holds 
for all e > 0, so | F(x) — F(y)\ < L||x — y|| 2 - Finally, applying Gaussian concentration of measure 
for the Lipschitz function F, 

P[F(0 - E F(£) > A + |EF(0 - EF(0| and (gG| 
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= p[F(0 > A + |EF(0 - EF(01 + EF(0 and £ G G\ 


< P[F(0 > A + E F(£)} < e"5T2. 


□ 


Lemma B.5. Suppose n,p —> oo with p/n -A 7 , and let e > 0. T/ien there exist c := c a)!/)7 > 0 and 
IV := iY a .^. 7 . £ > 0 and a set G := G n)P C M pxn with P[Z E G] > 1 — 2 e “2 , such that for all t > e 
and n > N, 

P[||M|| > || p a ,u,J +t andZ &G}< e~ cnt2 . 

Proof. Recall M = + S( ZZ T - diag(||Z;|| 2 )). Denote 

W — ((iTu) 1 <z<p , (V^2 Re w%j , \/2 Im uy 7 ) ] <i<j<p) E , 

so that the entries of W and Z are IID J\f(0, 1). Define / : RP“+ n P —y M and f v : RP“+ n P —>■ M for 
v G C p by f{W,Z) = \\M\\ and f v (W,Z) = v*Mv , so that f{W,Z) = sup, jeCP .|| v || 2=1 |/ t) (W,Z)|. 
By elementary calculations, and denoting Zj as the i th row of Z, 


df v {W,Z) H(u-a 2 ) 2 df v (W,Z ) _ 2^(v - a 2 ) 


dwu 

df v (W,Z) 


P ’ c^v^Reufo) 




Re(niUj), 


d{\/2 Im w l3 ) y p 
Then, for any v € C p such that ||u ||2 = 1 


2y(u a Im(uiUj), Vzjfv(W, Z) = ^ Re(ujUj)Zj. 


i=i 


IIv/u(w, z)ill = ^ (zy a) 


2\ P 


p 


2=1 


^K| 4 + 2 ]T Mil 2 ]+^Z) 


2 P 


2=1 


e(viVj)Zj 


3 = 1 
■?¥* 


< 


7 (p- a 2 ) 


2 \ / p 


£> 


E 


Vi 


\ i=1 / 

7 (u —a 2 ) 4a 2 ||Z || 2 

I o • 


, 4a 2 V-, ,2,171 

+ ^2>*l H z l 

2=1 


2II 112 

IMI2 


p 


rr 


Take G = {Z E M pXTl : ||Z|| < 2^/n + y/p}. Then by Corollary 5.35 of |SB|, P[Z ^ G\ < 2e 2 . As 
M p2 x G is convex, the above inequality implies /„(yy, Z) is L-Lipschitz on x G for L = 0(n -1 / 2 ). 
Then 

/(W,Z) -/(W 7 ,Z 7 ) < sup (|/u(VV, Z)| - |/„(W , ,Z , )|) 

v£C p :||w||2=l 

< sup |A(W,Z)-/„(W / ,Z / )| <L||(W,Z)-(W , ,Z / )|| 2 
^eCP:|h|| 2 =i 

for all VV, W 7 E and Z, Z 7 E G, so / is also L-Lipschitz on M p2 x G. 

Let / : M p +np -> M be the L-Lipschitz extension of / on M p X G given by Lemma 
that 


B.4 


Note 


|E/(yy, z) - E/(yy, z)| = | E[(/(yy, z) - /(yy, z))i{z 0 g}]| 

< E|/(yy, z)i{z 0 g}| + E|/(yy, z)i{z $ g}|. 
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Lemma |B.2| (with l = 1) implies E|/(W,Z)1{Z £ G}| = E[||M||1{Z G}] = o(l). As / is 

L-Lipschitz, 

I /(w, Z) I < 1/(0, 0)1 + L\\ (W, Z) || 2 = 1/(0,0)1 + L||(W, Z) || 2 = L||(W, Z)|| 2 . 

Let A n = 11|(W, Z)|| 2 < y] 2 (p 2 + np) j. As ||(W, ^")||| is chi-squared distributed with p 2 + np 


degrees of freedom, a standard tail bound gives P ||(W, Z)jj% > p 2 + np + t 


< e 8 (p 2 +"p) . Then 


E 


\\(W,Z)\\ 2 2 t{AZ} 


/ p 2 +np 


II (W, Z )\\2 > p 2 + np + t 


r 

Jv 2 


dt < I e 8 (p ' 2 + np) (If 

p 2 -\-np 


2 \Zp 2 + np J 


yj p2_|_ n p 
2 


e 2 ds ~ 4e 


This implies 

E|/(W, Z)1{Z 0 G}| < E[|/(W, Z)\1{Z i G}t{An}] + E[|/(W, Z)|1{Z 0 Gillen 
< LV2(p 2 + np)P[Z ^ G] + LE [||(W, Z)||| 1{A£}1 ^ = o(l). 


Then |E/(W, Z) —E/(W, Z)| = o(l), so Lemmas B.3 and B.4 imply, for all t > e and all sufficiently 
large n (i.e. n > N a ^ r/ ^ independent of t), 

P[||Af ni p|| > |||| + i and Z E G] 


< 


||M n , p || - E\\M n J > t - f + |E/(W, Z) - E/(W, Z)| and Z E G 


(t-e/2y 


< e 2Z 2 < e ST. 

The result follows upon noting that L = 0(?r -1 / 2 ). 


□ 


Proof of Proposition 5.11\ Let c > 0 and G C M pxn be as in Lemma B.5 Then, for any e > 0 


E[||M/l{Z e G}] < (||/ia,,, 7 || + e) Z + E 

||M||'l{||M|| > \\p a ^W+s}l{Z €G} 

= ( \\Pa,v,i | + e) z + / 

00 

p 

||M/ > t and Z G G 

dt 

■h 

||Ma,v,7||+e) ( 



= ( /Vi/,7 + e ) Z + / 

P[ M > s and Z E G] • Zs* -1 ds 

J| 




< ( AG ,177 | + e) Z + Z 

/OVJ 

/ e-^dl^ll+s/- 1 ^ 



for all sufficiently large n, where we have applied Lemma B.5 Note that 

/ oo poo 

e ~ Cns2 {\\Pa )V ,4 + s) l ~ 1 ds < l J e -cn S 2 +KIK„, 7 || + S ) ds 

2 pOO 2 

= le i\\»a,„A\+hr, / e ~ cn ( s ~ 2 ds 




7II ' 4 cn 


L 


e 2 dt 


cn 


l(JW ^ a ,U, 


7II ' 4cn 


2cn i £ ~ 2 hi) 


e - cn (c-2kT _> 0 


r\_/ 
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for l = O(logra), so E[||M|pl{Z e G}] < (||jU a ,iyy|| + e) + o(l). On the other hand, P[Z £ G] < 
2e _ 2 by Lemma B.5, so Lemma B.2 implies E[||M||^1{Z ^ G}] = o(l) for l = O(logn). Hence 


E[||M||*] < (||/^a,i/, 7 1| + z) 1 + o(l), and taking e —> 0 concludes the proof. 


□ 
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